HADOOP
COURSE CONTENT
Introduction
to Big Data – 1 Hours
- What is Big data What is
Hadoop.
- Limitations
of existing solutions for Big Data
- Hadoop differentiating factors
- Hadoop
eco-system components
Hadoop and its Architecture – 1 Hours
- Hadoop
Distributed File System
- Hadoop
Architecture
- Map
Reduce & HDFS
The
Hadoop Distributed File System (HDFS) – 3 Hours
- HDFS Design & Concepts
- Blocks, Name nodes and Data
nodes
- HDFS High-Availability
- Hadoop DFS The Command-Line
Interface
- Basic File System Operations
- Anatomy of File Read
- Anatomy of File Write
- Exercise on HDFS
Map Reduce – 3 Hours
- Map and Reduce Basics
- How Map Reduce Works
- Anatomy of a Map Reduce Job
Run
- Job Submission, Job
Initialization, Task Assignment, Task Execution
- Progress and Status Updates
- Job Completion, Failures
- Shuffling and Sorting
- Partition & Combiner
- Hadoop
Streaming
Map/Reduce
Programming – Java Programming – 3 Hours
- Hands
on “Word Count” in Map/Reduce in Eclipse
- Sorting
files using Hadoop Configuration API discussion
- Emulating
“grep” for searching inside a file in Hadoop
- Chain
Mapping API discussion
- Job
Dependency API discussion
- Input Format API discussion
- Input Split API discussion
- Custom Data type creation in
Hadoop
Pig
– 4 Hours
- Installation
- Execution
Types
- Grunt
Shell
- Pig
Latin
- Data
Processing
- Loading
and Storing
- Filtering
- Grouping
& Joining
- Working
with Functions
- User
Defined Functions
- Hands
on Exercises
Hive – 4 Hours
- Installation
- Hive Services
- Hive Shell
- Hive Server
- Hive Web Interface (HWI)
- Meta store
- Hive QL
- OLTP vs OLAP
- Working with Tables
- Working with Partitions
- User Defined Functions
- Hands on Exercises
HBase – 4 Hours
- HBase Installation
- HBase concepts
- HBase
vs RDBMS
- Master
& Region Servers
Sqoop –
3 Hour
- Installation
- Import
data from RDBMS
- Export
data to RDBMS
- Hands
on Exercises
Other
Eco systems – 2 Hours
Pure-play Hadoop distribution vendors - 2 hours
Proof
of concepts (POCs)