This 4 day training course is designed for developers who need to create real-time applications to ingest and process streaming data sources using Hortonworks Data Platform (HDP) and Hortonworks Data Flow (HDF) environments. Specific technologies covered includes: Apache Hadoop, Apache Kafka, Apache Storm, Apache Spark and Apache HBase as well as Apache NiFi. The highlight of the course is the custom workshop-styled labs that will allow participants to build streaming applications with Storm and Spark Streaming.
PREREQUISITES
Students should be familiar with programming principles and have experience in software development. Java programming experience is required. SQL and light scripting knowledge is also helpful. No prior Hadoop knowledge is required.
TARGET AUDIENCE
Developers and data engineers who need to understand and develop real-time /streaming applications on HDP and HDF
AGENDA SUMMARY
Day 1:Real-Time Architecture and Components
Day 2:Real-Time Processing with Spark Streaming
Day 3:Real-Time Processing with Storm
Day 4:Building DataFlows with HDF/NiFi
DAY 1 OBJECTIVES
Describe the Real-Time Architecture
Define the Purpose and Function of Apache Hadoop
Describe the Hadoop Ecosystem Frameworks
Describe the Role of Hadoop in the Datecenter
Describe the Hadoop Distributed File System (HDFS)
Detail the Major Architectural Components of HDFS and their Interactions
Demonstrate How to Use Apache Zeppelin with Apache Spark
List the Major Functions of Apache Zeppelin
Describe the Purpose and Benefits of Apache Spark
List the Spark High-Level Tools
Define Spark REPLs and Application Architecture
Explain the Purpose and Function of Reslient Distributed Datasets (RDDs)
List the Characteristics of an RDD
Explain Spark Programming Basics
Define and Use Basic Spark Transformations
Define and Use Basic Spark Actions
Describe an Anonymous Function
Invoke Functions for Multiple RDDs, Created Named Functions and Use Numeric Operations
LABS
Validating the Lab Environment
Using HDFS Commands
Introduction to SPARK REPLs and Zeppelin
Creating and Manipulating RDDs
DAY 2 OBJECTIVES
Define and Create Pair RDDs
Perform Common Operations on Pair RDDs
Describe Spark Streaming
Create and View Basic Data Streams
Perform Basic Transformations on Streaming Data
Utilize Window Transformations on Streaming Data
Recognize Use Cases for Apache Kafka
Explain the Concept of a Topic Leader and Followers
Describe the Publication and Consumption of Kafka Messages
Describe the Function and Purpose of Apache HBase
List Apache HBase Key Features
List the Components of the Apache HBase Architecure
Describe an Apache HBase as a set of Value Mappings
Idenfity Apache HBase as Either Row or Column Oriented Database
Demonstrate How to Invoke the HBase Shell
List General HBase Commands
List HBase Table Management Commands
List HBase Data Manipulation Commands
LABS
Creating and Manipulating Pair RDDs
Basic Spark Streaming
Basic Spark Streaming Transformations
Spark Streaming Window Transformations
Creating and Managing Apache Kafka Topics
Using the HBase Shell
Working with HBase Column Families
DAY 3 OBJECTIVES
Define the Terms Tuple, Stream, Topology, Spout, Bolt, Nimbus and Supervisor
Diagram the Relationship Between a Supervisor, Worker Process, Executor and a Task
Diagram how Storm Components Interact to Provide Scalable, Distributed and Parallel Computation of Real-time Data
Given the Java Code for a Topology, Diagram the Spout and Bolt Connections
Define the Purpose of a Stream Grouping
List the Types of Stream Groupings
Recognize and Explain Sample Spout and Bolt Java Code
List Functions that Apache ZooKeeper Provides to Apache Storm
List the Differences Between Storm Local Mode and Distributed Mode
Given a Topology Code Example, Describe the Spout and Bolt Connections in the Topology
Describe How to Integrate Apache Storm with Apache Kafka
List Tools Used to Manage Apache Storm
Display Online Help Using the Storm Command-line Client
Idenfity How to Open the Storm UI Console
Interpret the Metrics Displayed in the Apache Storm UI Console
Idenfity the Differences Between Reliable and Unrealiable Operation
Diagram a Tuple Tree and Identify its Branches
Liste the Two Requirements for Reliable Operation
Given a Diagram, Describe the Operation of an Acker Task
Describe the Responses to Various Apache Storm Component Failures
List Three Methods to Disable Reliable Operation
LABS
Creating a Word Count Topology
Performing a Kafka Word Count
Using Storm with Kafka and Hbase
DAY 4 OBJECTIVES
Define Enterprise Data Flow
Describe the Purpose and Function of HDF 2.0
Describe HDF 2.0 Components
Describe How IoT is Driving New Requirements
List NiFi Architecture and Features
List the Three Key Concepts of Apache NiFi
Install and Configure NiFi
List Configuration Best Practices
Describe the Components of the NiFi User Interface
Define the Anatomy of a Processor
Define the Anatomy of a Connection
Describe the Purpose of the Controller Services and Reporting Tasks