→Guidelines on Talend Data Integration v6 Certified Developer Exam preparation

→Big data - Hadoop with Talend DI

  • What is Hadoop?
  • Why only hadoop? can't we use existing technology stack to store and process large volumes of Data or unstructured data?
  • What is Commodity hardware? How it is different from the server category hardware?
  • Talend supported Hadoop distributions? How to handle Talend unsupported Hadoop distributions?
  • Market leading Hadoop distributions, comparison between the predominantly using distros.
  • What is Block and why HDFS block size is very large?
  • Namenode - Datanode / Master - slave architecture.
  • Rack awareness
  • HDFS data write operation
  • HDFS data read operation

→Hadoop daemons and architecture

  • Master daemons in MR1
  • MapReduce - distributed data processing framework working phenomena on Hadoop 1.x
  • Limitations of Hadoop 1.x
  • Master daemons in MR2
  • Yarn Architecture
  • Walk through with step by step YARN application execution

→Hadoop client gateway connectivity

  • What is Edge node / Gateway node
  • Ambari views, HUE
  • Talend - Hadoop cluster connection
  • HDFS commands
  • Lab Practical

→18.MapReduce job conventional approach:

  • Walk through with MapReduce classes and sample job
  • Sample MapReduce job execution
  • Lab Practical and assignments

→MapReduce job with Talend Bigdata edition

  • How Talend studio executes a Hadoop job and how other competitor ETL tools are executing Hadoop job?
  • MapReduce job design in Talend-Bigdata sandbox studio
  • MapReduce job demonstration in Talend-Bigdata studio pointing to external Hortonworks cluster
  • Lab Assignment

→Sqoop

  • What is Sqoop?
  • Sqoop Import/Export architecture
  • Sqoop connectors
  • Sqoop sample scripts
  • Direct-mode imports? Advantage of using direct mode?
  • Escape characters
  • Import

    RDBMS to HDFS

  • Full table Import
  • Import all tables, only subset of data
  • Encoding null values
  • Incremental import
  • Why do you need either primary key column or split by column is required for Sqoop import operation?
  • RDBMS to Hive

  • Hive Import
  • Hive import with partitions
  • RDBMS to Hbase

  • hbase table, hbase create table, hbase row key
  • How to improve performance of Hbase import job
  • Export

    HDFS to RDBMS

  • Insert data
  • Insert data in batches
  • Update an existing dataset in a database table
  • Update else Insert
  • Necessity of -- mapreduce-job- name

    Lab practical on Sqoop import to HDFS,Hive,HBase and Export to database table

    Sqoop Components in Talend studio

  • tSqoopImport, tSqoopImportAlltables
  • tSqoopExport, tSqoopMerge
  • Sample Talend job execution with Sqoop components
  • Lab practical with above list of Sqoop

→PIG

  • What is Pig and its role in Hadoop frame work?
  • Demo on sample pig script
  • Grunt shell: Local mode and cluster mode

Types

  • Scalar types, Complex types
  • Relation/Alias
  • Operators: Input & Output, Relational
  • User defined functions
  • Cogroup
  • Parameter substitution
  • Pig script execution steps
  • Debug pig relation/script: Describe, Explain, Illustrate
  • Pig Components in Talend studio

  • tPigLoad,tPigFilterRow,tPigSort,tPigJoin,
  • tPigCogrouptPigAggregate,tPigdistinct,
  • tPigMap,tPigCode,tPigReplicate,tPigStoreResult
  • Sample Talend job execution with Pig components

    Lab practical with above list of Pig components and more on tPigMap

    →Hive

    • What is Hive? Why do we need Hive?
    • Hive services and Hive clients.
    • Hive Architecture and Role of Metastore
    • Schema on Write vs Schema on Read
    • How hive is different from Regular RDBMS?
    • Type of HQL executions?
    • Data types: Primitive, Complex
    • Type of tables in hive
    • Multi table inserts
    • Use of Partitions & types of partitions in hive
    • What is Bucketing? when to use Bucketing and when to use partitioning
    • UDF types in Hive
    • Hcatalog

      Lab practical on : create table, view, Index, Load/insert, Multi table insert, Dynamic partition,CTAS,Alter table, select & joins, create an UDF

      Hive Components in Talend studio

    • tHiveConnection, tHiveCreateTable, tHiveLoad
    • tHiveRow, tHiveInput, tHiveClose
    • Sample Talend job execution with Hive components

      Lab practical with above list of hive components

    →Hcatalog components in Talend studio

    • tHcatalogOperation , tHcatalogLoad or tHcatalogOutput, tHcatalogInput
    • Sample Talend job execution with Hcatalog components

      Lab practical with above list of Hcatalog components

    →HBase

    • What is HBase?
    • HBase lacking features of RDBMS.
    • HBase internals and Architecture.
    • Arch difference between RDBMS and HBase.
    • HBase data storage Architecture (LSM-tree)
    • Data representation in HBase
    • Compare HBase with Hadoop file system
    • Pros and cons of column-oriented databases
    • HBase components and functionalities: Zookeper,HMaster,RegionServer,Client,Catalog tables
    • What If a hbase master node goes down?
    • When should one think of using HBase?
    • When not to use HBase?
    • What is Phoenix?
    • HBase Table DDL create,Disable,Drop,Alter
    • Data types
    • Reading, writing, and modifying data in an HBase table using commands.
    • Data read ,write, and modify using HBaseConfiguration,HTable classes.
    • HBase Components in Talend studio:

    • tHBaseConnection,tHBaseInput
    • tHBaseOutput,tHBaseClose
    • Sample Talend job execution with HBase components
    • Lab practical with above list of
    Online Courses Videos