Big Data Hadoop Administrator

Hadoop Administration Certification Training will guide you to gain expertise in maintaining complex Hadoop Clusters.

  • Use Cloudera Manager features for easier cluster management such as aggregated logging; configuration, resources, reports, alerts, and service management
  • The internals of Hadoop Distributed File System (HDFS), YARN, MapReduce
  • Determining the correct hardware and infrastructure for installing clusters based on requirements
  • Integrating clusters to the data center using proper cluster configuration and deployment
  • Load data into the cluster from dynamically-generated files using Flume and from RDBMS using Sqoop
  • Providing service-level agreements for multiple users of a cluster using Fair Scheduler
  • Troubleshooting, diagnosing, tuning, and solving issues that occur during production and development
  • Advanced topics in real-time event processing using Apache Storm, Kafka, Spark, NiFi
  • Best practices for preparing and maintaining Apache Hadoop in production
  • Weekend Batch Is Consisting Of 12 Classes Each Running For 5 Hours.
  • Weekday Batch Is Consisting Of 30 Classes Each Running For 2 Hours.
  • Total 60 Hours.
  • Pre- Training
  • Actual- Training
  • Post-Training

SYLLABUS

Introduction to Big Data and Hadoop
  • Types of Data
  • Characteristics of Big Data
  • Business Benefits of Big Data Technology
  • Hadoop and Traditional RDBMS
  • Hadoop Core Services
Describe the function of HDFS daemons
  • Describe the normal operation of an Apache Hadoop cluster, both in data storage and in data processing
  • Identify current features of computing systems that motivate a system like Apache Hadoop
  • Classify major goals of HDFS Design
  • Given a scenario, identify appropriate use case for HDFS Federation
  • Identify components and daemon of an HDFS HA-Quorum cluster
  • Analyze the role of HDFS security (Kerberos)
  • Determine the best data serialization choice for a given scenario
  • Describe file read and write paths
  • Identify the commands to manipulate files in the Hadoop File System Shell
YARN
  • Understand how to deploy core ecosystem components, including Spark, Impala, and Hive
  • Understand how to deploy MapReduce v2 (MRv2 / YARN), including all YARN daemons
  • Understand basic design strategy for YARN and Hadoop
  • Determine how YARN handles resource allocations
  • Identify the workflow of job running on YARN
  • Determine which files you must change and how
Hadoop Cluster Installation and Administration
  • Given a scenario, identify how the cluster will handle disk and machine failures
  • Analyze a logging configuration and logging configuration file format
  • Understand the basics of Hadoop metrics and cluster health monitoring
  • Identify the function and purpose of available tools for cluster monitoring
  • Be able to install all the ecoystme components in CDH 5, including (but not limited to): Impala, Flume, Oozie, Hue, Cloudera Manager, Sqoop, Hive, and Pig
  • Identify the function and purpose of available tools for managing the Apache Hadoop file system
Resource Management
  • Understand the overall design goals of each of Hadoop schedulers
  • Given a scenario, determine how the FIFO Scheduler allocates cluster resources
  • Given a scenario, determine how the Fair Scheduler allocates cluster resources under YARN
  • Given a scenario, determine how the Capacity Scheduler allocates cluster resources
Monitoring and Logging
  • Understand the functions and features of Hadoop’s metric collection abilities
  • Analyze the NameNode and JobTracker Web UIs
  • Understand how to monitor cluster daemons
  • Identify and monitor CPU usage on master nodes
  • Describe how to monitor swap and memory allocation on all nodes
  • Identify how to view and manage Hadoop’s log files
  • Interpret a log file
PROJECT USE CASES
  • Multi-Node Cluster Deployment
  • HDFS HA Configuration
  • RM HA Configuration
  • Implementing Custom Schedulers
Share This