Azure Data Factory INTERVIEW QUESTIONS PART 1

Interview Questions

1.Why do we need Azure Data Factory?

  • The amount of data generated these days is massive, and it comes from a variety of sources. There are only a few things that need to be taken care of when we migrate this data to the cloud.
    Data can take any form because it comes from several sources, and each source will transport or channelize the data in a different method and in a different format. When we move this data to the cloud or a specific storage location, we must ensure that it is well managed. That is, you must change the data and remove any unneeded bits. In terms of data movement, we must ensure that data is collected from many sources and brought to a common location where it may be stored.

2.What is Azure Data Factory?

  • Cloud-based integration service for orchestrating and automating data transit and transformation.
    • You can use Azure Data Factory to construct and plan data-driven processes (called pipelines) that may import data from various data sources, as well as analyse and transform the data using computing services such as HDInsight Hadoop, Spark, Azure Data Lake Analytics, and Azure Machine Learning.

3.What is the integration runtime?

  • Azure Integration Run Time: Azure Integration Run Time can copy data between cloud data storage and route the activity to a range of computing services, such as Azure HDinsight or SQL server, where the transformation occurs.
    • Self Hosted Integration Run Time: Self Hosted Integration Run Time is software that is virtually identical to Azure Integration Run Time. However, it must be installed on an on-premises machine or a virtual machine in a virtual network. A Self Hosted IR can perform copy operations between a public cloud data store and a local network data store. It can also assign transformation tasks to compute resources on a private network. We make use of Self Hosted IR b.

4.What is the limit on the number of integration runtimes?

  • In a data factory, there is no hard restriction on the number of integration runtime instances that can be present. There is, however, a limit on the number of VM cores that the integration runtime can employ for SSIS package execution per subscription.

5. What is the difference between Azure Data Lake and Azure Data Warehouse?

  • A data warehouse is a conventional method of storing data that is still commonly used today. Data Lake is supplementary to Data Warehouse; for example, if you have data in a data lake, it can be kept in a data warehouse as well, but certain standards must be observed.
    WAREHOUSE OF DATA LAKE
    In addition to the data warehouse
    Perhaps sourced to the data lake
    Data can be detailed or raw. It can take any shape or form. All you have to do is take the data and deposit it into your data lake. Data is filtered, summarised, and fine-tuned.
    On read, the schema (not structured, you can define your schema in n number of ways)
    Schema for writing (data is written in Structured form or in a particular schema)
    To proceed, use only one language.

6.What is blob storage in Azure?

  • Azure Blob Storage is a service that allows you to store massive volumes of unstructured object data, such as text or binary data. Blob Storage can be used to publish data to the public or to keep application data securely. Blob Storage is commonly used for the following purposes: 
    •Serving images or documents directly to browsers 
    •Storing data for dispersed access •Streaming video and audio
    •Storing data for backup and restore disaster recovery, as well as archiving. •Storing data for analysis by an on-premises or Azure-hosted service.

7.What is the difference between Azure Data Lake store and Blob storage?

  • Gen1 Azure Data Lake Storage The Purpose of Azure Blob Storage Storage optimised for big data analytics workloads A general-purpose object store that may be used for a wide range of storage scenarios, including big data analytics.
    Structure
    A hierarchical file system
    Object storage with a single namespace
    Important Ideas
    The Data Lake Storage Gen1 account includes folders, which contain data stored as files.
    The storage account has containers, which hold data in the form of blobs.
    Case Studies
    Log files, IoT data, click streams, and big datasets are examples of batch, interactive, and streaming analytics and machine learning data.
    Text or binary data of any form, such as application back end, backup data, media storage for streaming, and so on.

Follow Us on

Contact Us

Upskill & Reskill For Your Future With Our Software Courses

Azure Data Factory Training Institutes in Hyderabad

Contact Info

Open chat
Need Help?
Hello
Can we Help you?