Sanchit Kumar
Sanchit Kumar
goodreads_etl_pipeline
An end-to-end GoodReads Data Pipeline for Building Data Lake, Data Warehouse and Analytics Platform.
Udacity-Data-Engineering-Projects
Few projects related to Data Engineering including Data Modeling, Infrastructure setup on cloud, Data Warehousing and Data Lake development.
Cloudera_Material
Cloudera_Material: Study Material to help people preparing for Cloudera CCA Spark and Hadoop Developer Exam (CCA175). Feel free to collaborate.
Optimizing-Public-Transportation
A real-time event pipeline around Kafka Ecosystem for Chicago Transit Authority.
data-engineer-roadmap
Learning from multiple companies in Silicon Valley. Netflix, Facebook, Google, Startups
Big_Data_Project
Fake News Detection - Feature Extraction using Vectorization such as Count Vectorizer, TFIDF Vectorizer, Hash Vectorizer,. Then used an Ensemble model to classify whether the news is fake or not.