Introduction to Big Data and Azure Data Factory
- Overview of Big Data concepts
- Importance of Big Data Analytics
- Key use cases across different industries
Getting Started with Azure Data Factory
- Introduction to Azure Data Factory (ADF)
- Scenarios for using ADF
- Understanding code-free ETL
Core Concepts of Azure Data Factory
- Linked Services, Datasets, and Activities
- Pipelines and Triggers
- Integration Runtimes and Data Flows
Configuring Azure Storage Accounts
- Overview of Storage Accounts and types (Blob, Queue, Table)
- Creating and configuring Storage Accounts
- Demo: Uploading data and creating ADF
Working with Databricks and Azure Data Factory
- Creating Databricks Clusters and Notebooks
- Overview of data flow and debugging
- Integrating Databricks with Azure Data Factory
Copying Data between Blob Storage
- Scenarios for copying data from Blob to Blob
- Creating pipelines for data movement
- Using dynamic content and metadata
Azure SQL Database Integration
- Introduction to Azure SQL Database and relational databases
- Creating and managing SQL databases and tables
- Copying data between SQL databases
Logging and Monitoring in Azure Data Factory
- Fetching Pipeline logs using Stored Procedures
- Setting up audit logs for ADF operations
- Debugging and monitoring data movements
Change Data Capture and Incremental Loads
- Implementing incremental loads from SQL to Blob Storage
- Managing last processed values for tables
- Using Lookup and Copy Activities effectively
Triggering Pipelines in Azure Data Factory
- Event-based and schedule triggers
- Best practices for pipeline triggering
- Implementing complex triggers for data ingestion
Data Flows and Transformation
- Introduction to Data Flows in Azure Data Factory
- Performing SQL joins and other transformations
- Ensuring data quality through Data Flows
Data Flows and Transformation
- Introduction to Data Flows in Azure Data Factory
- Performing SQL joins and other transformations
- Ensuring data quality through Data Flows
Introduction to Azure Synapse Analytics
- Overview of Azure Synapse Analytics
- Key components and architecture
- Benefits of using Azure Synapse for data analytics and integration
Setting Up Azure Synapse Workspace
- Creating an Azure Synapse Analytics Workspace
- Navigating the Azure Portal for Synapse
- Understanding workspace features and settings
Integrating Pipelines in Azure Synapse
- Overview of data integration capabilities
- Building and managing data pipelines in Synapse
- Best practices for integrating various data sources
Monitoring and Administration in Synapse
- Monitoring your Azure Synapse Workspace
- Adding administrators and managing permissions
- Overview of administrative accounts in Synapse SQL
Leveraging Apache Spark in Azure Synapse
- Introduction to Apache Spark within Azure Synapse
- Creating and configuring Spark Pools
- Writing and executing notebooks in Synapse
Advanced Data Operations and Best Practices
- Creating Spark Job Definitions and submitting jobs
- Managing library packages for Apache Spark
- Using temporary tables and the OPENROWSET() function
- Best practices for data governance and security in Synapse
Introduction to Azure Databricks
- Overview of Azure Databricks
- Features and advantages of Databricks
- Databricks architecture
- Creating Databricks workspaces
- Managing Databricks workspaces
Data Management in Databricks
- Understanding DBFS
- Databases and tables
- Creating and managing databases and tables
- Working with Hive tables in Databricks
- Unity Catalog: Managing Data Access and Governance
Delta Lake Fundamentals
- Understanding Delta Lake and its advantages
- ACID transactions, time travel, and data versioning
- Managing and optimizing data storage with Delta Lake
Real-Time Stream Processing
- Introduction to Spark Streaming
- Processing real-time data with Databricks
- Structured Streaming
- Building Scalable Streaming Applications
MICROSOFT FABRIC (DP 700)
Introduction to Microsoft Fabric & Data Engineering
- Overview of Microsoft Fabric architecture
- Understanding Data Engineering in Fabric
- Key components: OneLake, Synapse, Data Factory, Notebooks
- Differences between Azure Fabric and traditional data engineering solutions
OneLake - Storage & Data Management
- Fundamentals of OneLake in Microsoft Fabric
- Data Ingestion from different sources (Azure Data Lake, AWS S3)
- Managing structured & unstructured data in OneLake
- Partitioning, Compression, and Indexing for large datasets
Data Factory - Ingestion & Orchestration
- Introduction to Azure Data Factory (ADF)
- Creating pipelines and dataflows in Data Factory
- Automating data pipelines using triggers & scheduling
DataFlow & Real-Time Data Processing
- Understanding DataFlow vs Pipelines in Fabric
- Processing real-time streaming data
- Data transformation & enrichment using Power Query
Synapse Data Science & Machine Learning
- Overview of Synapse Data Science workloads
- Setting up Fabric Notebooks for ML
- Building and training ML models with Scikit-learn & PyTorch
Synapse Data Science & Machine Learning
- Overview of Synapse Data Science workloads
- Setting up Fabric Notebooks for ML
- Building and training ML models with Scikit-learn & PyTorch
AI & Real-Time Analytics in Fabric
- Implementing Real-time AI using Synapse & KQL
- Predictive analytics and anomaly detection
- AI-powered insights using Power BI & Fabric Dashboards