Big Data Architect

Big Data Masters Program makes you proficient in tools and systems used by Big Data experts. It includes training on Hadoop and Spark stack, Cassandra, Talend and Apache Kafka messaging system.

After completing the course successfully, participants should be able to:

Explain the need for Big Data, and list its applications

Demonstrate the mastery of HDFS concepts and MapReduce framework

Design and Propose Solution for a Given Big Data problem

Participate in Big Data adoption and planning projects

Install and configure Big Data tools

Make predictions using machine learning

Understand the overall design goals of each of Hadoop schedulers

Understand the functions and features of Hadoop’s metric collection abilities

Discuss and differentiate various commercial distributions of Big Data like Cloudera and Hortonworks

Differentiate between Hadoop 1.0 and Hadoop 2.0

  • Weekend Batch Is Consisting Of 16 Classes Each Running For 5 Hours.
  • Weekday Batch Is Consisting Of 40 Classes Each Running For 2 Hours.
  • Total 80 Hours.
  • Pre- Training
  • Actual- Training
  • Post-Training

Fundamentals of SQL

Linux Fundamentals & Basic Admin Commands

Core Java Crash Course

Overview of Ganglia and Nagios.

Introduction to Ambari.

Overview of BI Tools.

Overview of Tableau

Overview of ClouderaManager

Certification Support

Project Support

100% Placement Assistance


  • Introduction to Big Data Technology Stack
  • Introduction to Hadoop and Ecosystem Build
  • Understanding Cluster Setup Activities
  • HDFS Architecture
  • HIVE Architecture
  • PIG Architecture
  • Introduction to NOSQL
  • HBase Architecture
  • Understanding Cloudera Manager and HUE
Describe the function of HDFS daemons
  • Describe the normal operation of an Apache Hadoop cluster, both in data storage and in data processing
  • Identify current features of computing systems that motivate a system like Apache Hadoop
  • Classify major goals of HDFS Design
  • Given a scenario, identify appropriate use case for HDFS Federation
  • Identify components and daemon of an HDFS HA-Quorum cluster
  • Analyze the role of HDFS security (Kerberos)
  • Determine the best data serialization choice for a given scenario
  • Describe file read and write paths
  • Identify the commands to manipulate files in the Hadoop File System Shell
  • Understand how to deploy core ecosystem components, including Spark, Impala, and Hive
  • Understand how to deploy MapReduce v2 (MRv2 / YARN), including all YARN daemons
  • Understand basic design strategy for YARN and Hadoop
  • Determine how YARN handles resource allocations
  • Identify the workflow of job running on YARN
  • Determine which files you must change and how
Hadoop Cluster Installation and Administration
  • Given a scenario, identify how the cluster will handle disk and machine failures
  • Analyze a logging configuration and logging configuration file format
  • Understand the basics of Hadoop metrics and cluster health monitoring
  • Identify the function and purpose of available tools for cluster monitoring
  • Be able to install all the ecoystme components in CDH 5, including (but not limited to): Impala, Flume, Oozie, Hue, Cloudera Manager, Sqoop, Hive, and Pig
  • Identify the function and purpose of available tools for managing the Apache Hadoop file system
Resource Management
  • Understand the overall design goals of each of Hadoop schedulers
  • Given a scenario, determine how the FIFO Scheduler allocates cluster resources
  • Given a scenario, determine how the Fair Scheduler allocates cluster resources under YARN
  • Given a scenario, determine how the Capacity Scheduler allocates cluster resources
Monitoring and Logging
  • Understand the functions and features of Hadoop’s metric collection abilities
  • Analyze the NameNode and JobTracker Web UIs
  • Understand how to monitor cluster daemons
  • Identify and monitor CPU usage on master nodes
  • Describe how to monitor swap and memory allocation on all nodes
  • Identify how to view and manage Hadoop’s log files
  • Interpret a log file


  • Introduction to MAPREDUCE
  • MapReduce Engine
  • – JobTracker
  • – Tasktracker
  • MapReduce Programming Model
  • – Mapper Class
  • – Reduce Class
  • Executing MapReduce Jobs
  • MapReduce and Java
  • MapReduce Programs in Java and Eclipse
Hive – The Data Warehouse in Hadoop
  • Concepts of Hive
  • Hive Architecture
  • Metastore
  • Driver
  • Thrift Server
  • Web Interface
  • CLI
  • Introduction to HQL (Hive Query Language)
  • The Hive Data Model
  • Partitions
  • Data types
  • Hive Configuration
  • Sample Hive Queries and commands
  • Pig Execution Mode
  • Local Mode
  • MapReduce Mode
  • Pig Engine
  • Pig Latin Scripts
  • Interactive Mode
  • Batch Mode
  • Configuring PIG
  • Sample PIG Scripts
  • Working With Hive-E-Commerce Use Case
  • Working With Pig-Financial Uses Case
  • Twitter Use Case-Sentimental Analysis
  • MR Optimization
  • Custom Combiner, Custom Partitioner And Distributed Cache
  • Advanced MapReduce
  • Datatypes in MapReduce
  • Input Formats in MapReduce
  • Output Formats in MapReduce
  • Joins in MapReduce
  • Reduce side join
  • Replicated join
  • Composite join
  • Use cases of MapReduce


  • Introduction to Sqoop
  • Sqoop Connectors to RDBMS
  • Importing Data from Sqoop to Hive
  • Sqoop Commands
  • Introduction to Flume
  • Flume Data Model
  • Flume Examples
  • Use Cases of Sqoop and Flume
  • Introduction to KAFKA
  • Basic operations
  • Consumer Group Examples
  • Use Cases of Apache Kafka
  • Introduction to NoSQL Databases
  • History of NoSQL
  • RDBMS vs NoSQL Comparision
  • Popular NoSQL DataBases
HBase – Distributed Columnar Database
  • NoSQL Movement
  • HBase Architecture
  • Region Servers
  • HBase Storage
  • Introduction to Zookeeper
  • Entities of Zookeeper
  • Leader
  • Follower
  • Observer
  • Zookeeper Data Model
  • Configuring HBase and Zookeeper
  • HBase Examples
  • HBase Use Cases
  • Entertainment Use Case
  • Twitter Use Case
  • Health Care Use Case
  • E-Commerce Use Case
  • Bio-Informatics Use Case
  • Multi-Node Cluster Deployment
  • HDFS HA Configuration
  • RM HA Configuration
  • Implementing Custom Schedulers
Share This