This 5 day training course is designed primarily for systems administrators and platform architects who need to understand HDP cluster capabilities, and manage HDP clusters. Topics include: Understanding HDF capabilities, Apache Hadoop, Apache YARN, HDFS, and other Hadoop ecosystem components. Students will understand how to administer, manage, and monitor HDP clusters.
PREREQUISITES
Students should be familiar with server or platform software concepts and have a basic understanding of system administration.
TARGET AUDIENCE
For students who range from understanding server software concepts to system administrators and platform architects who plan on administering HDP clusters.
FORMAT
50% Lecture/Discussion
50% Hands-on Labs
AGENDA SUMMARY
Day 1: Introduction to Hadoop and Ambari
Day 2: Managing HDFS, YARN Architecture and Management
Day 3: The YARN Capacity Scheduler, High Availability, Monitoring and Backups
Day 4: Advanced HDFS & YARN Services
Day 5: Additional HDP Components and Tuning
DAY 1 OBJECTIVES
Describe Apache Hadoop
Summarize the Purpose of the Hortonworks Data Platform Software Frameworks
List Hadoop Cluster Management Choices
Describe Apache Ambari
Identify Hadoop Cluster Deployment Options
Plan for Hadoop Cluster Deployments
Perform an Interactive HDP Installation using Apache Ambari
Install Apache Ambari
Describe the Differences Between Hadoop Users, Hadoop Service Owners and Ambari Users
Manage Users, Groups and Permissions
Identify Hadoop Configuration Files
Summarize Operations of the Web UI Tool
Manage Hadoop Service Configuration Properties using the Ambari Web UI
Manage Client Configuration Files Using the Command-line Interface
LABS
Setting Up the Lab Environment
Installing HDP
Managing Apache Ambari Users and Groups
Managing Hadoop Services
DAY 2 OBJECTIVES
Describe the Hadoop Distributed File System (HDFS)
Perform HDFS Shell Operations
Use the Ambari Files View
Use WebHDFS
Protect Data using HDFS Access Control Lists (ACLs)
Describe HDFS Architecture and Operation
Manage HDFS using Ambari Web, NameNode and DataNode UIs
Manage HDFS using Command-line Tools
Enable and Manage HDFS Quotas
Identify Reasons to Add, Replace and Delete Worker Nodes
Configure and Run HDFS Balancer
Decommission and Re-commission a Worker Node
Move a Master Component
Summarize the Purpose and Benefits of Rack Awareness
Configure Rack Awareness
LABS
Using Hadoop Storage
Using WebHDFS
Using HDFS Access Control Lists
Managing Hadoop Storage
Managing HDFS Quotas
Adding, Decommissioning, and Re-commissioning Worker Nodes
Configuring Rack Awareness
DAY 3 OBJECTIVES
Describe YARN Resource Management
Summarize YARN Architecture and Operation
Identify and Use YARN Management Options
Summarize YARN Response to Component Failure
Understand the Basics of Running a Sample YARN Application, Including: o MapReduce and Tez
Apache Pig
Apache Hive
Summarize the Purpose and Operation of the YARN Capacity Scheduler
Configure and Manager YARN Queues
Control Access to YARN Queues
LABS
Managing the YARN Service Using the Apache Ambari Web UI
Managing the YARN Service Using the CLI Commands
Running Sample YARN Applications
Setting Up for the Capacity Scheduler
Managing YARN Containers and Queues
Managing YARN ACLs and User Limits
Working with YARN Node Labels
DAY 4 OBJECTIVES
Summarize the Purpose of NameNode HA
Configure NameNode HA using Ambari
Summarize the Purpose of ResourceManager HA
Configure ResourceManager HA using Ambari
Summarize the Purpose and Operation of Ambari Metrics
Describe Features and Benefits of the Ambari Dashboard
Summarize the Purpose and Operation of Ambari Alerts
Configure Ambari Alerts
Summarize Hadoop Backup Considerations
Enable and Manage HDFS Snapshots
Copy Data Using DistCp
Use Snapshots and DistCp Together
Identify the Purpose and Operation of Heterogeneous HDFS Storage
Identify HDFS NFS Gateway Use Cases
Install and Configure an HDFS NFS Gateway
Summarize the Purpose and Operation of HDFS Centralized Caching
LABS
Configuring NameNode High Availability
Configuring ResourceManager High Availability
Managing Apache Ambari Alerts
Managing HDFS Snapshots
Using DistCP
Configuring HDFS Storage Policies
Configuring an NFS Gateway
Configuring HDFS Centralized Cache
DAY 5 OBJECTIVES
Configure YARN Queues, Tez, and Hive Properties to Support Performance Goals
Recall Basic Facts About Hive and the Hive Architecture
Recall the Requirements and Benefits of Hive HA
Summarize the Hive HA Architecture and Operation
Configure and Test Hive HA
Recall the Purpose, Job Types, Structure and Benefits of Oozie
Install and Configure Oozie using Ambari
Deploy and Manage a Sample Oozie Workflow
Identify Characteristics of Ambari Local Versus LDAP Users and Groups
Integrate Ambari Server with LDAP
Summarize the Purpose and Benefits of Ambari Blueprints
Recall the Process Used to Deploy a Cluster Using Ambari Blueprints