This course provides a technical overview of Apache Hadoop. It includes high-level information about concepts, architecture, operation, and uses of the Hortonworks Data Platform (HDP) and the Hadoop ecosystem. The course provides an optional primer for those who plan to attend a hands-on, instructor-led course.
PREREQUISITES
No previous Hadoop or programming knowledge is required. Students will need browser access to the Internet.
TARGET AUDIENCE
Data architects, data integration architects, managers, C-level executives, decision makers, technical infrastructure team, and Hadoop administrators or developers who want to understand the fundamentals of Big Data and the Hadoop ecosystem.
FORMAT
100% Lecture
Instructor Discussion
AGENDA SUMMARY
Day 1: Hadoop Overview and Demonstrations
DAY 1 OBJECTIVES
Describe the case for Hadoop
Identify the Hadoop Ecosystem and architecture
Data Management – HDFS, YARN
Data Access – Pig, Hive, HBase, Storm, Solr, Spark
Data Governance & Integration – Falcon, Flume, Sqoop, Kafka, Atlas
Security – Kerberos, Falcon, Knox
Operations – Ambari, Zookeeper, Oozie, Cloudbreak
Observe popular data transformation and processing engines in action: Apache Hive, Apache Pig, Apache Spark
Detail the architecture and features of YARN
Describe backup and recovery options
Describe how to secure Hadoop
Explain the fundamentals of parallel processing
Describe data ingestion options and frameworks for batch and real-time streaming