Delving Deep into Hadoop – Course Contents
Introduction
to Hadoop and Architecture
Hadoop
1.0 Architecture
- Introduction
to Hadoop & Big Data
- Hadoop Evolution
- Hadoop Architecture
- Networking Concepts
- Use cases - Where Hadoop fits into
Hadoop
2.0 Architecture
- Limitations
on Hadoop 1.0 Architecture
- Features of Hadoop 2.0 Architecture
- HDFS Federation
- High Availability of Name Node
- YARN – Yet Another Resource Negotiator
- Developing Applications on YARN
- Non MR applications on top of YARN
Quiz on Architecture
Concepts
Cluster
Installation
Hadoop Cluster Installation
- Types
of Hadoop Cluster
- Installing Pseudo Mode Cluster
- Walk thru on inbuilt scripts, directories, configuration files
and port numbers.
- Discussion on Real Time Cluster Size
Detailed
documentation on Installation Procedure
Distributed File System -
HDFS
HDFS
Commands
- Introduction
to HDFS Commands
- Discussion on scenarios where specific commands are applicable
- Introduction to Advanced HDFS Commands including fine tuning of
cluster
Detailed
documentation on all the HDFS Commands
Custom Script
building using HDFS & Unix commands
Quiz on HDFS Commands
Map
Reduce - MR
Map
Reduce using Java
- Introduction
to Map Reduce Architecture
- Detailed discussion on different phases of MR
- Mapper
- Reducer
- Splitting
- Sorting
- Shuffling
- Combiner
- Partitioning
- Developing Map Reduce Application from Scratch using different
use cases
- Discussion of difference between Old MR API & New MR API
- Introduction to different file formats and their internal
features (Sequential, Binary etc.,)
- Analytics using MR on to derive Banking Solution
Case Study on Map
Reduce (Customer Sentiment Analyser)
Map
Reduce using Python – Streaming
- Developing
Map Reduce Application using Python
- Discussion of different features available in Streaming
Case Study on Map
Reduce Streaming (Analytics on Temperature Datasets)
Quiz on Map Reduce
Hadoop Eco System Components
Hive
(Data Warehouse on top of HDFS)
- Introduction
to Hive Architecture
- Configuring Hive Metadata store in different ways
- Basic Queries in Hive (DDL, DML)
- Advanced features of Hive
- Partitioning
- Bucketing
- Sampling
- Multi Table Load Queries
- Serialize & De Serialize
- Dealing with different formats of data (Flat file, JSON, CSV
etc.,)
- Query optimization using Hive.
- Developing User Defined Functions (UDF’s) in Java & Python
Case Study (Analytics
on Telecom Datasets)
Quiz on Hive
PIG (Data Flow Language)
- Introduction
to Pig Latin
- Basic Commands in Pig
- Explanation advanced features of Pig with real time scenarios
- Different ways of using PigStorage
- Dealing with Unstructured data
- Developing Regular Expressions
- Developing User Defined Functions (UDF’s) in Java & Python
Case Study (Analytics
on Books Datasets)
Quiz on Pig
SQOOP
(Import – Export utility)
- Introduction
to Sqoop
- Basic Sqoop Commands
- Advanced Import Features
- Advanced Export Features
- Upsert Calls
- EVAL
- Compressed Formats
Case Study (Analytics
on Telecom Datasets)
Quiz on Sqoop
HBASE
(Versioned Database)
- Introduction
to HBASE & NOSQL
- Basic difference in Row Oriented and Column Oriented storage
- Basic HBASE Commands
- Advanced HBASE Features
- Versions
- Compression Techniques
- Bloom Filters
- Sequential Scans
- Bulk Loads to HBASE Features
Case Study
on HBASE
Quiz on HBASE
Flume
- Flume
Architecture
- Configuring Flume Components
- Building Flume Config files for different scenarios
- Basic Config File building
- Config file for connecting to different File Servers
- Config file for connecting to Web Servers
Quiz on Flume
Spark
- Introduction
to Spark and In-memory applications
- Understanding RDD (Resilient Distributed Dataset)
- Spark Context and Spark SQL Context
- Introduction to MLib, Streaming
Quiz on Spark
Kafka
- Introduction
to Kafka architecture
- Single and Multi-Broker configuration
- Java Sample Producer
- Integration with Hadoop (Flume) and Kafka
Quiz on Kafka
Finally this series
of Practical Sessions ends with Quiz on entire course.