MLZoomcamp2022
MLZoomcamp2022 copied to clipboard
My repo for the Machine Learning Engineering bootcamp 2022 by DataTalks.Club
MLZoomcamp2022 by DataTalks.Club
This is a 4 month practical bootcamp that puts you on the fast-track to becoming a Machine Learning Engineer:
- It starts from the basics of Machine and Deep Learning using a CRISP-DM Methodology
- It looks at internal mechanism of some of the most popular ML algorithms such as Linear Regression, Logistic Regression, Decision Trees, Ensemble Learning such as Random Forest and XGBoost, as well Neural Networks for Deep Learning.
- It then goes on to the packaging and deployment of ML/DL models with Docker, Flask and BentoML to cloud services
- The cloud services used are AWS EC2 and Lambda (for serverless computing)
- Additionally, it covers some essential advanced topics such as working with tflite and TensorFlow serving, as well as Kubernetes and KServe
- Finally, it ties everything together with 2-3 student-led projects that employ the tools and knowledge learned in the bootcamp
The highlights of the course are:
- Focus on collaborative problem solving, getting hands on with git and sharing in public via notes and write-ups
- It also includes weekly homeworks that serve as a guided walk-through of the concepts learned during the week
- Peer-reviewing and evaluation of projects
Dates | Coursework Link | Dataset Used | Notes | Homework Link | Dataset Used | Solution Link |
---|---|---|---|---|---|---|
5-12 September 2022 | Week 1: Intro to ML/Environment Setup | --- | Linear Algebra refresher | Homework 1 | Car Dataset | Solution Jupyter notebook |
13-19 September 2022 | Week 2: Linear Regression | Car Dataset | Visual overview for structuring a ML project | Homework 2 | California Housing Prices | Solution Jupyter notebook |
20-26 September 2022 | Week 3: Classification | Telco Customer Churn | Visual overview of EDA + Feature Engineering | Homework 3 | California Housing Prices | Solution Jupyter notebook |
27-03 Sep-Oct 2022 | Week 4: Evaluation of ML Models | Telco Customer Churn | Detailed ROC curve & evalaution metrics overview | Homework 4 | AER Credit Card Data | Solution Jupyter notebook |
04-10 October 2022 | Week 5: ML Deployment with Flask/Docker/AWS EC2 | Telco Customer Churn | WSL + Docker set up guide & Flask App and Dockerization notes | Homework 5 | AER Credit Card Data | Solution Jupyter notebook |
11-17 October 2022 | Week 6: Decision Trees and Ensemble Learning | Credit Scoring | Visual overview of decision tree and ensemble models | Homework 6 | California Housing Prices | Solution Jupyter Notebook |
18-24 October 2022 | Week 7: Production-ready ML with BentoML/AWS Fargate | Credit Scoring | Setting up and serving bentoML with WSL | Homework 7 | Credit Scoring | Solution Jupyter Notebook |
25 October - 7 November 2022 | Mids Week: Do your own project | --- | Detailed Instructions | Evaluation Criteria | Traffic Violation Dataset | Detailed descrition of project and instructions to reproduce |
8-21 November 2022 | Week 8: Deep Learning / CNN | Clothing Dataset (small) | Installing tensorflow + Connecting Github, VSCode and Saturn Cloud | Homework 8 | Dino or Dragon | Solution Jupyter Notebook |
22-28 November 2022 | Week 9: tflite/Serverless with AWS Lambda | Clothing dataset | Overview | Homework 9 | Dino or Dragon | Solution Jupyter Notebook |
29 November-5 December 2022 | Week 10: TF Serving/ Kubernetes | Clothing dataset | TBA | Homework 10 | Credit Scoring | Solution Jupyter Notebook |