Udacity-Data-Streaming-Nanodegree
Udacity-Data-Streaming-Nanodegree copied to clipboard
Udacity Data Streaming Nanodegree Program
Udacity - Data Streaming Nanodegree Program
Building up the latest skills to process data in real-time by building fluency in modern data engineering tools, such as Apache Spark,Kafka, Spark Streaming, and Kafka Streaming.
- Understand the components of data streaming systems. Ingest data in real-time using Apache Kafka and Spark and run analysis.
- Use the Faust Stream Processing Python library to build a real-time stream-based application. Compile real-time data and run live analytics, as well as draw insights from reports generated by the streaming console.
- Learn about the Kafka ecosystem, and the types of problems each solution is designed to solve. Use the Confluent Kafka Python library for simple topic management, production, and consumption.
- Explain the components of Spark Streaming (architecture and API), integrate Apache Spark Structured Streaming and Apache Kafka, manipulate data using Spark, and read DataFrames in the Spark Streaming Console.
Course 1 - Data Ingestion with Apache Kafka
Demonstrate knowledge of the tools data streaming tools including Kafka Consumers, Producers and Topics; Kafka Connect Sources and Sinks, Kafka REST Proxy for producing data over REST, Data Schemas with JSON and Apache Avro/Schema Registry, Stream Processing with the Faust Python Library, and Stream Processing with KSQL.
Contents
- Introduction to Stream Processing
- Apache Kafka
- Data Schemas and Apache Avro
- Kafka Connect and REST Proxy
- Stream Processing Fundamentals
- Stream Processing with Faust
- KSQL
Projects
- Optimize Chicago Bus and Train Availability Using Kafka
Course 2 - Streaming API Development and Documentation
Grow expertise in streaming data systems and build a continuous application with Structured Streaming, consume and process data from Apache Kafka with Spark Structured Streaming, create a DataFrame as an aggregation of source DataFrames, sink a composite DataFrame to Kafka, and visually inspect a data sink for accuracy.
Contents
- Streaming DataFrames
- Joins and JSON
- Redis, Base64 and JSON
Project
- Evaluate Human Balance with Spark Streaming