Azure Data Factory Interview Questions For Freshers

Interview Questions

1.Why do we need Azure Data Factory?

The amount of data generated these days is massive, and it comes from a variety of sources. There are only a few things that need to be taken care of when we migrate this data to the cloud.
Data can take any form because it comes from several sources, and each source will transport or channelize the data in a different method and in a different format. When we move this data to the cloud or a specific storage location, we must ensure that it is well managed. That is, you must change the data and remove any unneeded bits. In terms of data movement, we must ensure that data is collected from many sources and brought to a common location where it may be stored.

2.What is Azure Data Factory?

Cloud-based integration service for orchestrating and automating data transit and transformation.
• You can use Azure Data Factory to construct and plan data-driven processes (called pipelines) that may import data from various data sources, as well as analyse and transform the data using computing services such as HDInsight Hadoop, Spark, Azure Data Lake Analytics, and Azure Machine Learning.

3.What is the integration runtime?

Azure Integration Run Time: Azure Integration Run Time can copy data between cloud data storage and route the activity to a range of computing services, such as Azure HDinsight or SQL server, where the transformation occurs.
• Self Hosted Integration Run Time: Self Hosted Integration Run Time is software that is virtually identical to Azure Integration Run Time. However, it must be installed on an on-premises machine or a virtual machine in a virtual network. A Self Hosted IR can perform copy operations between a public cloud data store and a local network data store. It can also assign transformation tasks to compute resources on a private network. We make use of Self Hosted IR b.

4.What is the limit on the number of integration runtimes?

In a data factory, there is no hard restriction on the number of integration runtime instances that can be present. There is, however, a limit on the number of VM cores that the integration runtime can employ for SSIS package execution per subscription.

5. What is the difference between Azure Data Lake and Azure Data Warehouse?

6.What is blob storage in Azure?

Azure Blob Storage is a service that allows you to store massive volumes of unstructured object data, such as text or binary data. Blob Storage can be used to publish data to the public or to keep application data securely. Blob Storage is commonly used for the following purposes:
•Serving images or documents directly to browsers
•Storing data for dispersed access •Streaming video and audio
•Storing data for backup and restore disaster recovery, as well as archiving. •Storing data for analysis by an on-premises or Azure-hosted service.

7.What is the difference between Azure Data Lake store and Blob storage?

8. What are the steps for creating ETL process in Azure Data Factory?

If something has to be handled while we are attempting to extract data from an Azure SQL server database, it will be processed and stored in the Data Lake Store.
ETL Creation Procedures
•Create a Linked Service for the SQL Server Database as the source data store.
•Assume we have a dataset of automobiles.
•Build a Linked Service for the destination data store, Azure Data Lake Store.
•Create a dataset for saving data.
•Build the pipeline and include copy activities.
•Add a trigger to the pipeline to schedule it.

9.What is the difference between HDinsight & Azure Data Lake Analytics?

10. How can I schedule a pipeline?

To schedule a pipeline, use the scheduler trigger or time window trigger. • The trigger employs a wall-clock calendar schedule, which can schedule pipelines periodically or in calendar-based recurring patterns (for example, on Mondays at 6:00 PM and Thursdays at 9:00 PM).

11.Can I pass parameters to a pipeline run?

Parameters are, indeed, a first-class, top-level concept in Data Factory. You can define pipeline parameters and pass arguments when you run the pipeline on demand or via a trigger.

12. Can an activity in a pipeline consume arguments that are passed to a pipeline run?

The parameter value that is supplied to the pipeline and run with the @parameter construct can be consumed by each activity within the pipeline.

13.What has changed from private preview to limited public preview in regard to data flows?

You will no longer be required to supply your own Azure Databricks clusters; Data Factory will handle cluster creation and tear-down.
• Delimited text and Apache Parquet files are isolated from Blob datasets and Azure Data Lake Storage Gen2 datasets.
•
You can still store those files using Data Lake Storage Gen2 and Blob storage. For those storage engines, use the corresponding associated service.