Data-Engineer-Nanodegree-Projects-Udacity
Data-Engineer-Nanodegree-Projects-Udacity copied to clipboard

→

Metadata

Projects done in the Data Engineer Nanodegree Program by Udacity.com

Readme
Issues

Data-Engineer-Nanodegree-Projects-Udacity

Projects done in the Data Engineer Nanodegree by Udacity.com

Course 1: Data Modeling

Introduction to Data Modeling

Understand the purpose of data modeling
Identify the strengths and weaknesses of different types of databases and data storage techniques
Create a table in Postgres and Apache Cassandra

Relational Data Models

Understand when to use a relational database
Understand the difference between OLAP and OLTP databases
Create normalized data tables
Implement denormalized schemas (e.g. STAR, Snowflake)

NoSQL Data Models

Understand when to use NoSQL databases and how they differ from relational databases
Select the appropriate primary key and clustering columns for a given use case
Create a NoSQL database in Apache Cassandra

Project 1: Data Modeling with Postgres and Apache Cassandra

Course 2: Cloud Data Warehouses

Introduction to the Data Warehouses

Understand Data Warehousing architecture
Run an ETL process to denormalize a database (3NF to Star)
Create an OLAP cube from facts and dimensions
Compare columnar vs. row oriented approaches

Introduction to the Cloud with AWS

Understand cloud computing
Create an AWS account and understand their services
Set up Amazon S3, IAM, VPC, EC2, RDS PostgreSQL

Implementing Data Warehouses on AWS

Identify components of the Redshift architecture
Run ETL process to extract data from S3 into Redshift
Set up AWS infrastructure using Infrastructure as Code (IaC)
Design an optimized table by selecting the appropriate distribution style and sorting key

Project 2: Data Infrastructure on the Cloud

Course 3: Data Lakes with Spark

The Power of Spark

Understand the big data ecosystem
Understand when to use Spark and when not to use it

Data Wrangling with Spark

Manipulate data with SparkSQL and Spark Dataframes
Use Spark for ETL purposes

Debugging and Optimization

Troubleshoot common errors and optimize their code using the Spark WebUI

Introduction to Data Lakes

Understand the purpose and evolution of data lakes
Implement data lakes on Amazon S3, EMR, Athena, and Amazon Glue
Use Spark to run ELT processes and analytics on data of diverse sources, structures, and vintages
Understand the components and issues of data lakes

Project 3: Big Data with Spark

Course 4: Automate Data Pipelines

Data Pipelines

Create data pipelines with Apache Airflow
Set up task dependencies
Create data connections using hooks

Data Quality

Track data lineage
Set up data pipeline schedules
Partition data to optimize pipelines
Write tests to ensure data quality
Backfill data

Production Data Pipelines

Build reusable and maintainable pipelines
Build your own Apache Airflow plugins
Implement subDAGs
Set up task boundaries
Monitor data pipelines

Project 4: Data Pipelines with Airflow

./images/certification.jpg

← Metadata

83

Stars

35

Forks

Watchers

Owner

Metadata

Projects done in the Data Engineer Nanodegree Program by Udacity.com