AZURE DATA ENGINEER INTERVIEW QUESTIONS
Microsoft Azure is a cloud computing platform that offers hardware as well as software. In this case, the service provider establishes a managed service to allow users to access these services on demand.
Polybase optimises data intake into PDW and provides T-SQL capability. It enables developers to query external data from supported data stores in a transparent manner, regardless of the storage architecture of the external data store.
• Query data stored in Hadoop, Azure Blob Storage, or Azure Data Lake Store from Azure SQL Database or Azure Synapse Analytics using Polybase. It eliminates the requirement to import data from a third party.
• Use a few easy T-SQL queries to import data from Hadoop, Azure Blob Storage, or Azure Data Lake Store without the need to install a third-party ETL tool.
• You can export data to Hadoop, Azure Blob Storage, or Azure Data Lake Store.
To reduce Azure Storage charges, Microsoft offers the option of reserving capacity on Azure storage. The reserved storage on Azure cloud provides clients with a fixed amount of capacity during the reservation period. It is possible to store Gen 2 data in a normal storage account using Block Blobs and Azure Data Lake.
Azure Synapse is a limitless analytics service that brings together Big Data analytics and enterprise data warehousing. It gives users the freedom to query data on individual terms for using either serverless on-demand or provisioned resources at scale.
It is intended to process enormous volumes of data in tables with hundreds of millions of rows. Because Synapse SQL runs on a Massively Parallel Processing (MPP) architecture that distributes data processing across numerous nodes, Azure Synapse Analytics performs complicated queries and returns query results in seconds, even with large data.
Applications communicate with a control node, which serves as a gateway to the Synapse Analytics MPP engine. When the control node receives a Synapse SQL query, it converts it to MPP-optimized format. Furthermore, the individual operations are transmitted to compute nodes that can complete the tasks in parallel, resulting in much improved query performance.
Dedicated SQL Pool is a set of features that enables the use of Azure Synapse Analytics to implement the more traditional Enterprise Data Warehousing platform. Data Warehousing Units (DWU) are used to measure resources, which are provisioned using Synapse SQL. A dedicated SQL pool stores data in columnar storage and relational tables, which improves query performance and reduces the amount of storage required.
Azure Stream Analytics is a dedicated analytics solution that offers a simple SQL-based language called Stream Analytics Query Language. It enables developers to extend the query language’s capabilities by defining more ML (Machine Learning) functions. Azure Stream Analytics can handle massive amounts of data at over a million events per second while also delivering insights with ultra-low latency.
In Azure Stream Analytics, a window is a block of time-stamped event data that allows users to execute statistical operations on the event data.
To partition and analyse a window in Azure Stream Analytics, four types of windowing functions are available:
• Tumbling Window: The tumbling window function divides the data stream into distinct fixed-length time pieces.
• Hopping Window: The data segments in hopping windows can overlap.
• Sliding Window: Aggregation occurs every time a new event occurs, as opposed to Tumbling and Hopping windows.
• Session Window: This window has three parameters: timeout, maximum duration, and partitioning key.
In Azure, there are five categories of storage:
• Azure Blobs: A blob is a big binary item. It can read and write various types of files, including text files, movies, photos, documents, binary data, and so forth.
• Azure Queues: Azure Queues is a cloud-based message store that allows you to initiate and broker communication across different apps and components.
• Azure Files: It is a structured method of storing data in the cloud. Azure Files has one major advantage over Azure Blobs: it allows data to be organised in a folder structure and is SMB compliant, allowing it to be used as a file sharing.
• Azure Disks: It serves as a storage option for Azure virtual machines (Virtual Machines).
• Azure Tables: A NoSQL storage solution
It is a versatile standalone tool that can administer Azure Storage from any platform and is available for Windows, Mac OS, and Linux. Microsoft Azure Storage is available for download.
It allows easy access to many Azure data stores like as ADLS Gen2, Cosmos DB, Blobs, Queues, Tables, and so on.
One of Azure Storage Explorer’s important benefits is that it allows users to work even when they are offline from the Azure cloud service by attaching local emulators.
It is the Azure implementation of Apache Spark, a big data processing platform that is open source. Azure Databricks is located in the data preparation or processing step of the data lifecycle. First and foremost, data is imported into Azure via Data Factory and permanently stored (such as ADLS Gen2 or Blob Storage). Data is then analysed in Databricks using Machine Learning (ML), and the resulting insights are loaded into Azure Analysis Services such as Azure Synapse Analytics or Cosmos DB.
Finally, findings are visualised and presented to end users using analytical reporting tools such as Power BI.
It is a storage service designed specifically for storing structured data. Table entities are the basic units of data in structured data that correspond to rows in a relational database table. Each entity represents a key-value pair, and table entities have the following properties:
• PartitionKey: It saves the partition key to which the table entity belongs.
• RowKey: It uniquely identifies the entity within the partition.
• TimeStamp: It records the table entity’s most recently changed date/time value.
In most computing scenarios, the programme code is either on the server or on the client. However, serverless computing adheres to the stateless code nature, which means that the code does not require any infrastructure.
Users must pay for the compute resources utilised by the code during a brief period of execution. It is incredibly cost-effective, with users just paying for the resources they utilise.
• Azure SQL Firewall Rules: Azure offers two degrees of protection. The first are server-level firewall rules, which are stored in the SQL Master database and control access to the Azure database server. The second type of firewall rule is database-level firewall rules, which govern access to particular databases.
• Azure SQL Always Encrypted: It is intended to protect sensitive data saved in the Azure SQL database, such as credit card details.
• Azure SQL Transparent Data Encryption (TDE): This is the technology that is utilised to encrypt data saved in the Azure SQL Database. TDE is used for real-time database encryption/decryption as well as backups/transactions of log files.
Microsoft Azure SQL Database Auditing: Azure offers this service.