emr-cluster topic
spark-boilerplate
A boilerplate for spark projects with docker support for local development and scripts for emr support.
Repo-2019
BERT, AWS RDS, AWS Forecast, EMR Spark Cluster, Hive, Serverless, Google Assistant + Raspberry Pi, Infrared, Google Cloud Platform Natural Language, Anomaly detection, Tensorflow, Mathematics
goodreads_etl_pipeline
An end-to-end GoodReads Data Pipeline for Building Data Lake, Data Warehouse and Analytics Platform.
terraform-aws-emr-cluster
Terraform module to provision an Elastic MapReduce (EMR) cluster on AWS
aws-dbs-refarch-datalake
Reference Architectures for Datalakes on AWS
pyspark-on-aws-emr
The goal of this project is to offer an AWS EMR template using Spot Fleet and On-Demand Instances that you can use quickly. Just focus on writing pyspark code.
demo-code
Bits of code I use during live demos
Udacity-Data-Engineer-nanodegree
Classwork projects and home works done through Udacity data engineering nano degree
terraform-emr-spark-example
An example Terraform project that will configure a Secure and Customizable Spark Cluster on Amazon EMR.