Chapter 1: Cloud Basics, Azure SQL
- Cloud Introduction and Azure Basics
- Azure Implementation: IaaS, PaaS, SaaS
- ADE Job Roles; Azure Storage Components
- Azure ETL & Streaming Components; Need for Azure Data Factory (ADF)
- Need for Azure Synapse Analytics; Azure Resources and Resource Types
- Azure Account, Subscription (Free); Azure SQL Server [Logical Server]; Firewall Rules
- Azure SQL Database & SQL Pool Deployment; DTU Versus DWU; SSMS Connections
Chapter 2: Synapse SQL Pools (DWH)
- Dedicated SQL Pools in Azure; Data Warehouse with Synapse
- Massively Parallel Processing (MPP); Control Nodes and Compute Nodes
- DMS: Data Movement Service; Start/Resume/Pause& Scaling
- SQL Pool Config @ TSQL Scripts; Start/Resume/Pause, Scaling Options
- Table Creations@ TSQL Scripts; Table Partitions: Left & Right
- Distributions: Round Robin, Hash
- Distributions: Replicate and Usage; Auto Indexing & Column Store; Planning for Big Data Loads; Need for ADF
Chapter 3: Azure Data Factory, Pipelines
- ADF Concepts; ADF Pipelines
- Architecture; Integration Runtime (IR) & Use
- Linked Services and Datasets; Pipeline Activities: Copy Data Tool; DIU : Data Integration Units; DTU Vs DWUs Vs DIU; ADF Pipeline with Copy Data Tool
- Azure SQL DB to Synapse Data Loads
- Multi Tables Data Loads with ADF; Bulk Insert, Data Copy Methods
- ETL Staging: Storage Account; Staging Container Connections; DIU Allocations; ETL Pipeline Monitoring, Runs
Chapter 4: OnPremise Data Loads, Upsert
- Copy Data Tool :
- Incremental Loads; On-Premise Data Sources with Azure; Self Hosted Integration Runtime (IR); Access Keys, Remote Linked Service
- Synapse SQL Pool (DW), on Premise; ETL Staging with Storage Account
- Copy Method: Polybase – Tuning; Polybase : Big Data Loads
- ETL Pipelines for Incremental Loads; Business Keys For Table Upsert; Pipeline Schedules with ADF; ETL Logging with Storage Account
- Copy Method: UPSERT; DIU, DOCP & Publish; Manual Pipeline Executions in ADF
Chapter 5: File Incremental Loads in ADF
- Incremental Loads with Files (BLOB); ETL Schedules
- Tumbling Window; Execution Retry and Delay Options; Binary Copy
- Structural Data Loads; Incremental Loads Verification Tests
- Incompatible Rows & Fault Tolerance; Pipeline Compression & Tuning; Pipeline Publish, Monitor Options
- Azure Monitor Resource : Metrics; ADF Metrics; Pipeline Monitoring; Synapse: Storage Monitoring, Alerts; Conditions, Signal Rules and Metrics; Alerts & Action Groups: Emails
Chapter 6: ADF Data Flow - 1
- Data Flow Task, Data Flow Activity; Transformations with Data Flow
- Spark Cluster for Debugging; Cluster Node Configurations; Spark Cluster Types & Sizing; Transaction Optimized – Capacity
- Memory Optimized – Capacity; Data Cleansing with ADF; Data Orchestration with Data Flow
Chapter 7: ADF Data Flow - 2
- ADF Pipelines For ETL Operations; Data Flow Tasks, Activities in Synapse; JOIN & EXISTS Transformations; Aggregate & Group By Transformations
- Window Functions, Rank in Data Flow; Rank / DenseRank / Row Number; Derived Column Transformation; Lookup, Surrogate Key, Parse
- Type Convert, Cast Transformations; Reusing Data Flow Tasks in Synapse; Pipeline Validations & Executions
- Inline Datasets, Schema Drift; Data De-duplication with ADF; DFT OptimizationTechniques; Data Flow Task - Staging, Logging
Chapter 8: Azure Synapse Analytics
- Azure Synapse Analytics Resource; Azure Synapse Analytics Workspace; Managed Resource Group, SQL Account; Synapse Workspace & Synapse Studio; Operations with Synapse Workspace; DLS Gen 2 Storage Account, Container
- Synapse Studio: Scripts & Pipelines; Dedicated SQL Pools: Creation, Use; Synapse Tables, Data Loads with TSQL; COPY INTO Statements with T-SQL; Row Terminator and Compression; T-SQL Queries and Aggregations; Aggregation Data Loads; Synapse Pipelines with TSQL; Stored Procedure Activity & Triggers
Chapter 9: Synapse Analytics with Spark
- Synapse Pipelines: Performance Advantage; Pivot Transformation For Normalization; Generate Pivot Column, Aggregations; Pivot Transformation & Pivot Setting; Pivot Key Selection, Value and Nulls
- Pivoted Columns & Column Pattern; Column Prefix, Help Graphic, Metadata; Denormalized Data and Aggregations; Apache Spark Pool in Azure Synapse
- Spark Cluster Nodes: Vcores,Memory; Notebooks : Purpose, Usage Options; Python Notebooks; Databases in Apache Spark Pool; Data Loads from Dedicated SQL Pools; PySpark Code for Data Operations, Writes
Chapter 10: Synapse Security & Parameters
- Azure Active Directory (AAD) Users, Groups; IAM: Identity & Access Management; Synapse Workspace Security with RBAC; ADF Security: RBAC, Owner, Contributor; Azure Synapse SQL Pool Security: Logins; Creating SQL Logins & Users : master; SQL Users in Azure SQL DB and SQL Pool
- Grant, Control, Revoke: Security Roles; Parameters - Creation and Use in Pipelines; Dynamic Connections with Credentials; User Name and Password Connectivity; Dynamic Dataset Configurations; Pipeline Expressions with Parameters; Resource Classes
Chapter 11: Change Data Capture (CDC)
- Change Data Capture (CDC) Data Loads; Incremental Loads with CDC Types; SQL Server CDC : ETL Load Dates; Pipeline Expression, Data Window; JSON Parameters, Pipeline Scheduling; ETL Optimization Techniques; Serverless Pool in Azure Synapse; Connections, Use with Serverless Pool; Using Azure Open Datasets in Synapse
- OPENROWSET and BULK Data Loads; Working with Parquet Files in Synapse; Python Notebooks (Pyspark) in Synapse
AZURE DATABRICKS
Chapter 1: Azure Intro, Azure Databricks (CDC)
- Azure Cloud : SaaS, PaaS, PaaS & IaaS; Azure Cloud : Storage, ETL Resources; Azure Databricks; Compute Resources; Need for Azure Databricks (ADB)
- Azure Databricks :Purpose & Config; ;Azure Databricks Service Creation; Azure Databricks Component; Azure Databricks Workspace, Usage; Spark Cluster Configurations, Capacity; Driver Nodes, Worker Nodes in Spark
- Cluster Types : Personal, Unrestricted; CPU, Memory & IO Resources; Virtual Machines (VM) for Clusters; Databricks : Runtime & DBFS Storage; DBFS : Files, Tables with Spark DB
Chapter 2: SparkDatabase, SQL Notebooks
- DBFS : File Uploads from ON-Premise; Creating Spark Tables; Spark DB; Data Explorer: HIVE Metastore; Data Explorer; Spark Database, Tables; Notebooks: SQL, Python and Scala; Creating SQL Notebooks in Databricks
- Creating User Defined Spark Databases; Connecting / Using Spark Databases; Spark SQL : Big Data Loads; Spark SQL : Database & Table List; Spark SQL; Data
- Aggregations, Jobs; Spark SQL : Data Analytics, Reports; Analytics: X, Y Axis, Group By; Notebooks : Export, Import, Clone; Notebooks : Storage & Versions
Chapter 3: Python Intro, Data Loads
- Python : Introduction, Real-time Use; Python For ETL and DWH; Python For Azure: Data Engineer; Python Data Frames & Purpose; Python Dataframes – Pandas; Python with Spark Integrations; PySpark for DDL and ETL; PySpark Versus SQL Notebooks; Reading DBFS Data into Spark; Creating Dataframes for ETL; Temporary Views & Dataframes; Spark Temp Views: Aggregations; Spark Table Loads, HIVE Data; dataframe.write.format(); Spark Parquet Tables
Chapter 4: PySpark with ADLS
- Azure Storage Account : Creation; Azure Data Lake Storage : HNS; Creating Containers in ADLS; BLOB File Uploads / Generation; Account Key : Access Key / SAS Key; BLOB Access URL for Databricks; WASBS URL for PySpark Notebook; Generating PySpark Script; PySpark Connection Variables; Databricks : Data Import Scripts; Config Options with ADLS, Spark; spark.config (), Session Context; DataFrames with Temp Tables; Escape Sequence ; HIVE & Spark DB
Chatper 5: PySpark Widgets & Spark
- Widgets : Notebook Parameters; dbutils.widget module : Text, Combo; Dropdown, Multi Select Parameters; dbutils help(), get() & remove()
- Dataframes, Spark SQL @ Variables; Python Data Frames, Spark SQL; Reading Parameters Values; Parameters Versus Variables; Using Parameters For Temp Tables; Using Parameters for Spark Tables
- Data Storage and HIVE Metastore; Reading Parameterized Data; Format Strings with PySpark; Dynamic Queries with Spark SQL; Aggregations and f Strings
Chapter 6: Architecture, Workflows
- Driver Nodes, Worker Nodes, DBUs; RDD : Resilent Data Distribution; DAG : Directed Acyclic Graph; Hadoop HDES and Spot Instance; Cluster Manager, Master Node; RDDS, Worker, Excecutor & Slave; Hadoop HDES & Databricks Runtime
- Databricks Optimization Techniques; Spot Instance, Photon Acceleration; All Purpose Cluster, Job Cluster; Databricks Jobs: Creation & Tasks; Jobs with Parameters, Executions; Task Dependency & Notifications; Continuous & Manual Schedules; Active Jobs, Recent Run Jobs, Monitor
Chapter 7: Databricks Security, Scala
- Azure Databricks Security Operations; Azure Active Directory (Azure AD); AD Users and RBAC with IAM; Owner, Contributor & Reader Roles; Workspace Admin
- Permissions; Notebook Permissions & Share; Workflow Security, HTTP Path; User Tokens & ServerName; Scala : Differences with PySpark; Scala
- Variables Declaration, Usage; SparkSQL with Scala Notebooks; Temp Views with Scala Notebooks; Aggregations with Scala Notebooks; Visual Data Analytics with Scala; PySpark to Scala Conversions
Chapter 8: Scala with ADLS, Azure SQL
- Data Imports with Azure SQL DB; Using Scala for Big Data Loads; Spark SQL Queries @ Temp Views; Variables, display(), spark.read(); Scala Transformations, display(); JSON, AVRO and DBFS Mounts; fs.azure.sas.container @ ADLS; dataframe.write.jdbc() & JVM; JDBC Connection, DataframeWriter; Data Extraction, SQLContext
- Spark Context and Spark Session; SQLServerDriver with Scala; ADLS with Scala Notebooks; ;Parameters (Widgets) with Scala
Chapter 9: DeltaLake Incr Loads, DWH
- Azure DeltaLake Implementation; ACID Properties, Upsert Advantages; Delta Engine Optimizations & Uses; Pipeline Creation: JSON Files in DBFS; Delta Tables Creation, Data Loads; Spark Cluster Settings: Auto Optimize; Auto Compact, Delta Table Optimize; JSON Files, Delta Streaming Location; Joins and Merge with Delta Tables
- Incremental Loads, Delta Tables; Create & Use DWH with Databricks; Upsert (Merge) with Spark Tables; Big Data & Jupyter Notebooks; Databricks with Data Factory (ADF); End to End Implementations
POWER BI
POWER BI INTRODUCTION
- Power BI : Introduction to Analytics
- Power BI Tools Suite, Advantages
- Power BI : Career Options, Plan
- Power BI Developer Job Role
- Microsoft Data Analyst Job Role
- Big Data Analyst Job Role
- Power BI Data Analyst (PL 300)
- Data Engineer*, Power BI (DP 500 *)
- Artificial Intelligence (AI) Visuals
- AI Enabled Power BI Features
- Course - Lab Plan with Design Tools
- Need for Power Query & DAX
- Power BI Licensing Types
- BI Cloud – Advantages
- Power BI Report Server Advantages
Basic Report Design
- Power BI Eco System: Architecture
- Data Sources & Types in Real-world
- Report Types: Interactive, Paginated
- Analytical Reports & Mobile Reports
- Data Sources : File, Database, Web
- Visualizations : Report
- Power BI Desktop Tool : Installation
- Desktop Interface: Overview, Canvas
- Get Data, Data View, Report View
- In-Memory Xvelocity Database
- Basic Visuals: Table, Tree Map
- Data Labels, Legend, Category
- Local Store: PBIX & PBIT Files
- Points and Tooltips
Basic Report Design
- Visual Interaction with Data Points, Disabling / Enabling Interactions
- Edit Interactions: Format Options, Spotlight and Focus Mode
- Report Export to CSV, PDF, Tooltip Options and Usage
- Working with Pages in PBI, Rename, Duplicate, Hide Pages
- Slicer Visual : Real-time Usage, Orientation, Selection Properties
- Slicer Settings : Tiles & Slider, Single & Multi Select, Header
- Number, Text, Show Summary,Date Slicer and Value Selections
- Slicer List, Dropdowns & Clear
Grouping & Hierarchies
- Grouping : Visuals with Pdf Sources, List Grouping and Binning Options
- Grouping Static / Fixed Data Values, Grouping Dynamic / Changing Data
- Bin Size and Bin Limits (Max, Min),Bin Count and Grouping Options
- Group with Bins & Clustering, Group, Layer with Selection Pane
- Creating Hierarchies in Power BI, Independent, Dependant Drill-Down
- Drill-Down with Interactive Reports,Conditional Drilldowns, Data Points
- Drill Up Buttons and Operations,Expand & Show Next Level
- Dynamic Data Drills Limitations
Filters & Bookmarks
- Filters : Types and Usage in Real-time, Visual Filter, Page Filter, Report Filter
- Basic, Advanced and TOP N Filters,Category and Summary Level Filters
- Data / Drill Options, DrillThru Filters, Keep All Filters" Options in DrillThru
- CrossReport Filters, Include, Exclude, Drill-thru Filters, Page Navigations
- Bookmarks : Report Navigations, Buttons, Images with Actions
- Selection Pane, Actions, Text URLs, Show Data and See Records
- Custom Tooltips, Table Visual, Table Vs Matrix : Drill-downs
- Styles, Cell Properties, Databars, Conditional Formatting, Divergent
Big Data Access, Visuals
- OLTP Databases, Big Data Sources, Azure Database Access, Reports
- Import, Direct Query & Dual Mode, Data Modeling: Do Not Summarize
- Data Modeling: Currency, Relations,Power BI Archtiecture, Eco System
- Power BI Interface for Reports, Stacked Chart, Clustered Chart
- Line Chart, Area Chart, Bar Chart, 100% Stacked Bar & Column Chart
- Map Visuals: Tree, Filled, Bubble, Small Multiples, Legends, AxisV
- Cards, Funnel, Table, Matrix, Scatter Chart : Play Axis, Labels
- Waterfall Chart, Multi Row Cards
POWER QUERY LEVEL 1
- Power Query M Language Purpose
- Power Query Architecture and ETL
- Data Types, Literals and Values
- Power Query Transformation Types
- Table & Column Transformations
- Text & Number Transformations
- Date, Time and Structured Data
- let, source, in statements @ M Lang
- Get Data, Table Creations and Edit
- ETL Operations with Power Query
- Merge Transformations in Power BI
- Join Kinds: Inner, Outer & Apply
- Union All Transformation & Appends
- Power Query Editor, Step Edits
- & Apply Options. Report Design
POWER QUERY LEVEL 2
- Query Duplicate, Query Reference
- Group By and Advanced Options
- Aggregations with Power Query
- Transpose, Header Promotion
- Reverse Rows and Row Count
- Data Type Changes & Detection
- Replace Columns: Text, NonText
- Advanced Query Edit Options
- Replace Nulls: Fill Up, Fill Down
- Pivot, Unpivot Transformations
- Move Column and Split Column
- Date & Time Transformations
- Derive Year, Quarter, Month, Day
- Add Column : Query Expressions
- Query Step Inserts and Step Edits
POWER QUERY LEVEL 3
- Big Data Loads : Parameter Queries
- Creating Parameters in Power Query
- Parameter Data Types, Default Lists
- Static & Dynamic Lists: List Queries
- Convert Tables to Lists, Use Cases
- Linking Parameters to Queries
- Testing Parameters with Canvas
- Multi-Valued Parameter Lists
- Creating Lists in Power Query
- Converting Lists to Table Data
- Invoke Function, Type Conversions
- Function Query & Parameter List
- Columns From Examples, Indexes
- Conditional Columns, Expressions
- Disable / Enable Data Loads
POWER BI CLOUD - 1
- Power BI Cloud Components
- App Workspaces, Report Publish
- Reports & Related Datasets Cloud
- Creating New Reports in Cloud
- Report Publish, Report Uploads
- Report Edits and New Reports
- Report Actions: Downloads
- Dataset Usage Options in Cloud
- Dashboards Creation and Usage
- Pining Visuals and Report Pages
- Visual Pin Actions in Dashboards
- Dashboard & LIVE Interactions
- Media Tiles: Images, Custom Links
- Q & A Option with Dashboards
- Pin with Q & A; Standard Visuals
POWER BI CLOUD - 2
- Report Actions : Share, Subscribe
- Report Actions : Lineage, Embed
- Report Actions : Export Options
- Report Actions : Public User Access
- Dashboard Actions : Share, Subscribe
- Dashboard Actions : Themes, Lineage
- Dashboard Actions : Share, Subscribe
- Favorite, Insights, Embed Code
- Gateways Configuration, PBI Service
- Gateway Types, Cloud Connections
- Gateway Cluster, Add Data Sources
- Data Refresh : Manual, Scheduled
- Power Query Parameters, Gateways
- DataFlows, Power Query in Cloud
- Lineage, Share, Subscribe, Insights
- Performance Inspector& Gateways
POWER BI CLOUD - 3
- Workbooks : Excel Online & Pins
- Power BI Apps: Creation & Usage
- Power BI Segments, Content
- Navigation Screens, Audience
- App Publish, Verification & Edits
- Export, Share & Subscribe
- List View & Lineage View Options
- Power BI Scorecards: Realtime Use
- Paginated Reports - Design & Usage
- Power BI Report Builder Tool
- Microsoft Report Builder Tool
- Report Builder : Datasets, Charts
- Report Builder : Bar Charts, Fields
- Report Builder : Creating RDL Files
- Paginated Reports : Deployments
DAX Functions - Level 1
- DAX : Importance in Real-time
-
- DAX Data Types, Syntax Rules
- DAX Measures and Columns
- ROW Context and Filter Context
- Operators, Special Characters
- DAX Functions, Vertipaq Engine
- DAX Cheat Sheet : Expressions
- Data Analytics with DAX
- DAX Measures : Expressions
- ISBLANK, IF, IN, SUM
- SUMX, AVG, AVERAGEX
- Data Models: Fact, Dimensions
- Detecting Relations for DAX
- Star & Snowflake Schemas
- Data Modeling Options in DAX
DAX Functions - Level 2
- Quick Measures in Power BI
- and Filtered Average
- Running Totals, EARLIER( )
- RELATED, COUNTROWS
- CALCULATE Function Conditions
- ALL Members Scope & IN
- Account and Time Calculations
- Star Rating, DAX Expressions
- Data Modeling Options in DAX
- 1:1, 1:M and M:1 Relations
- Working with Facts & Measures
- Modeling : Missing Relations
- Relationships & Importance
- Modeling : Relation Management
- Modeling with Multiple Keys
DAX Functions - Level 3
- DAX : Variables and Expressions
- Dynamic Expressions, RETURN
- Current Value, Previous Value
- SELECTED VALUE, Joins
- FORMAT Function with DAX
- RELATED, Joins in DAX
- DAX Expressions with SQL DB
- Time Intelligence Functions
- Date Dimension : Generation
- CALENDAR(), DATESYTD()
- TOTALYTD, TOTALQTD
- TODAY, DATE, DAY with DAX
- SELECTEDVALUE, FORMAT
- Date, Time and Text Functions
DAX Functions - Level 4
- RLS: Row Level Security
- Data Models in Power BI Desktop
- DAX Roles Creation and Testing
- DAX Expressions & Operators
- PBIX Uploads: Power BI Cloud
- Dataset Security with DAX Roles
- Entity Sets and Slicing in DAX
- Dataflows with Power BI
- Analytical Reports - DAX Usage
- Creating Data Models with DAX
- Datasets in Excel and Dashboards
- Using Excel Analyzer in Power BI
- Power BI Data Source in Excel
- Strings and Refresh
- Analytical Reports - Limitations
Power BI Report Server
- Power BI Report Server Config
- SQL Server Instance Verifications
- Report Server DB, Temp Database
- WebService & WebPortal URL
- Uploading Interactive Reports
- End User Report Share (pdf)
- Power BI Desktop RS Tool
- Interactive Reports: Report Server
- Mobile Reports : Design Options
- Mobile Reports : Grids, Elements
- Mobile Reports : Uploads, Edits
- Paginated Reports : Deployments
- Paginated Vs Interactive Reports
- Paginated Vs Analytical Reports
- Paginated Vs Mobile Reports
- Power BI Report Server Vs Cloud
Power BI Admin & AI
- Power BI Cloud Management, Power BI Admin : Alerts
- Workspace Management, Users
- Security: Report, Dataset Levels, Security: Dataset, App Levels
- Security: Workspace Options
- PBI Performance Inspector
- Power BI & Artificial Intelligence, Power BI & CoPilot Add-Ins
- AI Visuals & Big Data Analytics
- Smart Narrative and Q & A
- Infographics, Icons and Labels
- Key Influencer Visual in Power BI, Metrics Visual, Performance
- Paginated Reports Visual