Hadoop ETL Training

→ 1. Hadoop Fundamentals

  • Motivation for Hadoop
  • Hadoop Overview
  • HDFS
  • MapReduce
  • Hadoop Ecosystem
  • Use Cases

→ 2. Hadoop Distributed File System (HDFS)

  • HDFS Architecture
  • Namenodes and DataNodes
  • Work flow of HDFS
  • Data blocks and its placement
  • Hands on Exercise: Using HDFS

→ 3. Map Reduce

  • Map Reduce Architecture
  • Job Tracker and Task Tracker
  • How Map Reduce Works
  • Hands on Exercise: Running a Map Reduce Job

→ 4. Introduction to HBase

→ 5. Hadoop Setup

  • Making a fully distributed cluster on a single laptop/desktop
  • Install and Configure Apache Hadoop on a multi node cluster in lab
  • Install and Configure Cloudera Hadoop distribution in fully distributed mode
  • Install and Configure Horton works Hadoop distribution in fully distributed mode

→ 6. Introduction to ETL

  • What is ETL?
  • Necessity for ETL
  • Introduction to open source ETL tools

→ 7. Introduction to Pig

  • What is Pig?
  • Pig's Features
  • Pig's use cases
  • Interacting with Pig
  • Setup and Configuration of Pig
  • Grunt Shell
  • Hands on Exercise : Using Pig

→ 8. Basic Data Analysis with Pig

  • Pig Latin Syntax
  • Loading Data
  • Simple Data Types
  • Field Definitions
  • Data Output
  • Viewing the Schema
  • Filtering and Sorting Data
  • Commonly used Functions
  • Hands on Exercise: Using Pig for ETL Processing

→ 9. Processing complex data with Pig

  • Storage Formats
  • Complex/Nested Data Types
  • Grouping
  • Built-in Functions for Complex Data
  • Iterating Grouped Data
  • Hands on Exercise

→ 10. Multi-Dataset Operations with Pig

  • Techniques for Combining Data Sets
  • Joining Data Sets in Pig
  • Set Operations
  • Splitting Data Sets
  • Hands on Exercise

→ 11. Pig's Extended Features

→ 12. Introduction to Hive

  • What is Hive?
  • Hive Schema and Data Storage
  • Comparing Hive to Traditional Databases
  • Hive vs Pig
  • Hive Use Cases
  • Interacting with Hive
  • Hive Setup and Configuration
  • Hands on Exercise : Hive Setup

→ 13. Relational Data Analysis with Hive

  • Hive Databases and Tables
  • Basic Hive-QL and Syntax
  • Data Types
  • Joining Data Types
  • Common Built-in Functions
  • Hands on Exercise: Running Hive Queries on the Shell, Scripts and Hue

→ 14. Hive Data Management

  • Hive Data Formats
  • Creating Databases and Hive-Managed Tables
  • Loading Data into Hive
  • Altering Databases and Tables
  • Self-Managed Tables
  • Simplifying Queries with Hive
  • Storing Query Results
  • Hands on Exercise: Data Management with Hive

→ 15. Text Processing with Hive

  • Overview of Text Processing
  • Important String Functions
  • Using Regular Expressions in Hive
  • Sentiment Analysis and N-Grams
  • Hands on Exercise

→ 16. Hive Extended Features

→ 17. Introduction to Impala

  • What is Impala?
  • How Impala Differs from Hive and Pig
  • How Impala Differs from Relational Databases
  • Impala, Dremel and Apache Drill
  • Limitations and Future directions
  • Using the Impala Shell
  • Hands on Exercise

→ 18. Analyzing Data with Impala

  • Basic Syntax
  • Data Types
  • Filtering, Sorting and Limiting Results
  • Joining and Grouping Data
  • Hands on Exercise

→ 19. Choosing the best tool for the Job

  • Comparing MapReduce, Pig, Hive, Impala and Relational Databases
  • Which one to choose?

→ 20. Introduction to Flume

  • Architecture
  • Reliability
  • Scalability
  • Manageability
  • Extesibility

→ 21. Setup and Configure Flume

  • Setup and Configuration of Flume
  • Hands on Exercise

→ 22. Introduction to Sqoop

→ 23. Sqoop Tools

  • Using Command Aliases
  • Controlling the Hadoop Installation
  • Using Generic and Specific Arguments
  • Using Tools
  • Setup and Configuration of Sqoop
  • Hands on Exercise

→ 24. Sqoop Import

  • Connecting to a Database Server
  • Selecting the Data to Import
  • Free-form Query Imports
  • Controlling Parallelism
  • Controlling the Import Process
  • Controlling type mapping
  • Incremental Imports
  • File Formats
  • Importing Data into Hive
  • Importing Data into Hbase
  • Hands on Exercise

→ 25. Sqoop Export

  • Introduction
  • Inserts vs Updates
  • Exports and Transactions
  • Hands on Exercise

→ 26. ETL using Scripts

  • Shell Scripts
  • AWK
  • Perl
  • Hands on Exercise for Shell Scripts, Awk and Perl

→ 27. Other ETL tools

  • ETL using Talend
  • Informatica ETL
  • Other ETL tools

→ 28. ETL using Pentaho

  • Introduction to Pentaho & Hadoop
  • Visual development for Hadoop data preparation and modelling
  • Interacting visualization and exploration for Hadoop
  • Pentaho Visual MapReduce
  • Pentaho connection with HDFS
  • Pentaho connecting to MapReduce
  • Pentaho connecting to Hbase
  • Pentaho connecting to Hive
Online Courses Videos