Hadoop Training Module

Hadoop

Apache Hadoop is an open-source software framework for distributed storage and distributed processing of Big Data on clusters of commodity hardware. Its Hadoop Distributed File System (HDFS) splits files into large blocks (default 64MB or 128MB) and distributes the blocks amongst the nodes in the cluster. For processing the data, the Hadoop Map/Reduce ships code (specifically Jar files) to the nodes that have the required data, and the nodes then process the data in parallel. This approach takes advantage of data locality, in contrast to conventional HPC architecture which usually relies on a parallel file system (compute and data separated, but connected with high-speed networking)

Duration

30 Hours

Hadoop
    Big Data, Hadoop, Introduction to Hadoop Architecture and HDFS
  • Rise of Big Data
  • Compare Hadoop vs traditonal systems
  • Core components of Hadoop
  • Hadoop Master-Slave Architecture
  • Understanding HDFS Architecture
  • NameNode, DataNode, Secondary Node
  • Learn about JobTracker, TaskTracker
  • Installing and setting up a Hadoop Cluster
  • Hadoop deployment Modes - Standalone, Single node, Multinode
  • Configuration files in a Hadoop Cluster
  • Important Web URL’s for Hadoop
  • Manual for installation of Hadoop
  • Manual for Demo VM installation
  • Manual for Multinode Hadoop Cluster installation on AWS
  • Understanding Hadoop MapReduce Framework
  • Overview of the MapReduce Framework
  • Use cases of MapReduce
  • MapReduce Architecture
  • Concept of Mappers, Reducers
  • Anatomy of MapReduce Program
  • Mapper/Reducer Class, Driver code
  • Understand Combiner and Partitioner
  • Advance MapReduce - Part 1
  • Write your own Partitioner
  • Writing Map and Reduce in Python
  • Map Side Join
  • Distributed Join
  • Distributed Cache
  • Reduce Side Join
  • Counters
  • Joining Multiple datasets in MapReduce
  • Advance MapReduce - Part 2
  • MapReduce internals
  • Understanding Input Format
  • Custom Input Format
  • MapReduce API
  • Hadoop Data Types
  • Using Writable and Writable comparable
  • Understanding Output Format
  • Sequence Files
  • JUnit and MRUnit Testing Frameworks
  • Apache Pig
  • PIG vs MapReduce
  • PIG components
  • PIG execution
  • PIG Data types
  • PIG Architecture
  • PIG Latin Relational Operators
  • PIG Latin Join and CoGroup
  • PIG Latin Group and Union
  • Describe, Explain, Illustrate
  • PIG Latin: File Loaders
  • PIG Latin: Creating UDF
  • Apache Hive and HiveQL
  • What is Hive
  • Hive DDL - Create/Show/Drop Database
  • Hive DDL - Create/Show/Drop Tables
  • Hive DML - Load Files into Tables
  • Hive DML - Inserting Data into Tables
  • Hive SQL - Select, Filter, Join, Group By
  • Hive Architecture & Components
  • Hive Data Model and Data Units
  • Difference between Hive and RDBMS
  • Advance HiveQL
  • Multi-Table Inserts
  • Joins
  • Grouping Sets, Cubes, Rollups
  • Custom Map and Reduce scripts
  • Hive SerDe
  • Hive UDF
  • Hive UDAF
  • Apache Flume, Apache Sqoop, Apache Oozie
  • Sqoop - How Sqoop works
  • Import/Export Data
  • Sqoop Architecture
  • Flume - How it works
  • Flume Complex Flow - Calculation/ Multiplexing
  • Oozie - Simple/Complex Flow
  • Oozie - Components
  • Oozie Service/ Scheduler
  • Example Workflow
  • Use Cases - Time and Data triggers
  • Running/Debuggin a Coordinator Job
  • Bundle
  • NoSQL Databases
  • Introduction to NoSQL
  • CAP theorem
  • RDBMS vs NoSQL
  • Analytical (OLAP)
  • Key Value stores: Memcached, Riak
  • Key Value stores: Redis, Dynamo DB
  • Column Family: Cassandra, HBase
  • Graph Store: Neo4J
  • Document Store: MarkLogic,MongoDB
  • Document Store: CouchBase,CouchDB,Exist DB
  • Apache HBase
  • When/Why to use HBase
  • HBase Architecture/Storage
  • HBase Features
  • HBase Data Model
  • HBase Families
  • Terms and Daemons
  • HBase Master
  • HBase vs RDBMS
  • Column Families
  • Access HBase Data
  • HBase API
  • Runtime modes
  • Running HBase
  • Apache Zookeeper
  • What is Zookeeper
  • Who is using it
  • Zookeeper Data Model
  • ZNode versions
  • Zookeeper API
  • ZNokde Types
  • Sequential ZNodes
  • Security
  • Standalone/Clustered mode
  • Installing and Configuring
  • Running Zookeeper
  • Zookeeper use cases
  • Hadoop 2.0, YARN, MRv2
  • Hadoop 1.0 Limitations
  • MapReduce Limitations
  • History of Hadoop 2.0
  • HDFS 2: Architecture
  • HDFS 2: Quorum based storage
  • HDFS 2: High availability
  • HDFS 2: Federation
  • YARN Architecture
  • Classic vs YARN
  • YARN Apps
  • YARN multitenancy

You have gone through all the training and reading material, since you decided to choose your career. This might have given you initial understanding about the procedures and techniques and you are even able to pass your certification as well ...

Best - SAP training center in London.