Data-Science-and-Machine-Learning-Projects-Dojo icon indicating copy to clipboard operation
Data-Science-and-Machine-Learning-Projects-Dojo copied to clipboard

collections of data science, machine learning and data visualization projects with pandas, sklearn, matplotlib, tensorflow2, Keras, various ML algorithms like random forest classifier, boosting, etc

trafficstars

Data Science, Machine Learning & Visualization Dojo

Collections of Data Science & ML projects and dojo where I practice Data Science, Machine Learning, Deep Learning and Data Visualization related skills, theories, probability, statistics, etc.

Built with

Machine Learing, Deep Learning, Data Science libraries

  • NumPy - package for scientific computing with Python
  • Pandas - fast, powerful, flexible and easy to use open source data analysis and manipulation tool
  • Pandas Profiling - generate reports from dataframe
  • Geo Pandas - support for geographic data to pandas objects.
  • Scikit-learn - Simple and efficient tools for predictive data analysis
  • TensorFlow - An end-to-end open source machine learning platform
  • Keras - Deep Learning framework
  • NLTK - Natural Language Toolkit
  • dlib - A toolkit for making real world machine learning and data analysis applications in C++
  • Face Recognition - The world's simplest facial recognition api for Python and the command line

Data Visualization libraries

  • Matplotlib - a comprehensive library for creating static, animated, and interactive visualizations in Python
  • Seaborn - statistical data visualization
  • Bokeh - interactive visualization library for modern web browsers
  • Plotly - The front-end for ML and data science models
  • Cufflinks - Productivity Tools for Plotly + Pandas

Turning into Web applications

  • Streamlit - The fastest way to build and share data apps
  • Flask - a micro web framework written in Python

Spark

  • Apache Spark - a unified analytics engine for large-scale data processing.
  • Spark with pyspark - PySpark is the collaboration of Apache Spark and Python
  • Databricks - Unified Data Analytics Platform - One cloud platform for massive scale data engineering and collaborative data science.

Tools and Datasources


Projects

Breast Cancer Tumor Diagnostic - Classification Project

Fandango movie ratings - Capstone Project

Data Analysis and Visualization Capstone project from Machine Learning and Datascience Masterclass Course.

  • This is the data behind the story Be Suspicious Of Online Movie Ratings, Especially Fandango’s
  • using data from 538
  • If you are planning on going out to see a movie, how well can you trust online reviews and ratings? Especially if the same company showing the rating also makes money by selling movie tickets.
  • Do they have a bias towards rating movies higher than they should be rated?
  • etc..

Supervised Learning Capstone Project - Cohort Analysis & Customer Churn Predictions

  • This project is to build a machine learning model to predict whether or not a customer will Churn or not.
  • Includes cohort analysis based on Telco subsriber's contract type, etc.

Predicting Heart Disease - Classification Project

Milestone project from Complete Machine Learning and Data Science - Zero to Mastery course.

Predicting Bulldozer Sale Price - Regression Project

Milestone project from Complete Machine Learning and Data Science - Zero to Mastery course.

Deep Learning ANN Project - Dog breed predictions

Project from Complete Machine Learning and Data Science - Zero to Mastery course.

911 Calls - Data Capstone Project

Data Analysis and Visualization Capstone project from Data Science and Machine Learning Bootcamp Course.

  • analyzing 911 calls data from kaggle
  • top 5 zips code for 911 calls
  • top 5 townships for 911 calls
  • most common Reason for a 911
  • different types of visualizations based on the findings
  • etc..

ML App - Random Forest Algorithm - ML Project

  • Machine learning app using streamlit, for building a regression model using the Random Forest algorithm.

Machine Learning & Data Science Projects

Masterclass Projects

  • Ames Housing Data Project - Linear Regression
  • Heart Disease Detection Project - Logistic Regression
  • Sona Data - Detecting Rock or Mine Project - KNN
  • Wine Fraud Detection Project - SVM
  • Mushroom Edible or Poisonous Prediction Project - with AdaBoost
  • Mushroom Edible or Poisonous Prediction Project - with Gradient Boosting
  • Ecommerce Project - Linear Regression
  • Advertisement Project - Logistic Regression
  • Anonymized Data Project - KNN
  • Supervised Learning Capstone Project - Cohort Analysis & Customer Churn Predictions
  • NLP - Flight Tweets Sentiment Analysis - Classification
  • NLP - Moview Reivew Sentiment Analysis - Classification
  • Color Quantization - KMeans
  • CIA Country Analysis and Clustering - KMeans
  • Cars Model - Hierarchical Clustering
  • Wholesale Customers - DBSCAN Clustering
  • Breast Cancer - PCA Manual Implementation
  • Breast Cancer - PCA with sklearn

Other Projects

  • Project - Used Car Price Prediction with XG-Boost
  • Project - Predict Career Longevity for NBA Rookies with Binary Classification - Logistic Regression
  • Project - Facial Classification - SVM
  • Project - Predict Sales Revenue with Interaction Term - Multiple Linear Regression
  • Project - Predict Sales Revenue - Simple Linear Regression
  • Project - Breast Cancer Tumor Diagnostic Classification - SVM
  • Project - Music Recommender
  • Project - Smarty Brain Image Prediction

Deep Learning Projects

  • Iris Flower Predictions App on Flask
  • ANN - Loan Default Prediction Prediction Project
  • ANN - Predict House Price for House Sales in King County, USA Project
  • ANN - Breast Cancer Wisconsin (Diagnostic) Project
  • CNN - Convolutional Neural Networks for Image Classification - MNIST data Project
  • CNN - Convolutional Neural Networks for Image Classification - CIFAR 10 data Project
  • CNN - Convolutional Neural Networks for Image Classification - Real Image - Malaria Detection Project
  • CNN - Convolutional Neural Networks for Image Classification - Fashion MNIST Data Project
  • RNN - Forzen Dessert Sales Forecasting with LSTM
  • NLP - Yelp Reviews Classification - Natural Language Processing Project
  • Average Eating Habits of UK Countries - Autoencoders

Data Analysis and Visualization Projects

  • Data Visualization with Python - Project: Data analysis and Data Visualization using Pandas, Matplotlib for Countries's GDP, Life Expectancy comparison across continents, GDP per Capita Relative Growth, Population Reative Growth comparison etc.
  • Fuel Economy Case Study - Project: Analyzing Fuel Economy Data provied by EPA for distributions of greenhouse gas score, combined mpg in 2008 and 2018, correlation between displacement and combined mpg ,greenhouse gas score and combined mpg. Are more unique models using alternative fuels in 2018 compared to 2008? By how much? How much have vehicle classes improved in fuel economy (increased in mpg)? What are the characteristics of SmartWay vehicles? Have they changed over time? (mpg, greenhouse gas) What features are associated with better fuel economy (mpg)? What is the top vehicle which improved the most in terms of combined mpg from 2008 to 2018?
  • Wine Quality Case Study - Project: Analyzing wine data for the following points for wine businesses to model better wine. Is a certain type of wine (red or white) associated with higher quality? What level of acidity (pH value) receives the highest average rating? Do wines with higher alcoholic content receive better ratings? Do sweeter wines (more residual sugar) receive better ratings? White Vs Red Wine Proportions by Color & Quality
  • TV, Halftime Shows, and the Big Game - Project: Analyzing Superbowls data and answering questions like - What are the most extreme game outcomes? How does the game affect television viewership? How have viewership, TV ratings, and ad cost evolved over time? Who are the most prolific musicians in terms of halftime show performances?
  • Weather Trend - Project: Analyzing Global weather trends, Singapore weather trends, Comparing Global vs Singapore 10 years Moving Average trends
  • Real-time Insights from Social Media Data - Project: Analyzing Twitter data and answering questions like: What are gobal trend and local trends?, finding the common trends
  • frequency analysis on tweets and hashtags, etc.
  • Statistics From Stock Data: Analyzing google, apple and amzon stock prices and checking the rolling mean.
  • Android Play Store App Data Analysis - Project: Analyzing andriod play store data and answering questions like - How many apps are paid? How much money are they making? When were these apps released?

Bootcamps

RL - Practical AI with Python and Reinforcement Learning - JP - On Hold

  • [x] 00. NumPy Crash Course
  • [x] 01. Matplotlib Visualization
  • [x] 02. Pandas and Scikit-learn
  • [x] 03. ANNs
  • [x] 04. CNNs
  • [x] 05. Introduction to gym
  • [ ] 06. Classical Q Learning
  • [ ] 07. Deep Q Learning
  • [ ] 08. Deep Q Learning on Images
  • [ ] 09. Creating Custom Open AI Gym Environment

Tensorflow 2.0: Deep Learning and Artificial Intelligence - LP

  • [x] Section 2 - Google Colab
  • [ ] Section 3 - Machine Learning and Neurons
  • [ ] Section 4 - Feedforward Artifical Neural Networks
  • [ ] Section 5 - CNN Convolutional Neural Networks
  • [ ] Section 6 - RNN - Recurrent Neural Networks, Time Series, Sequence Data
  • [ ] Section 7 - NLP
  • [ ] Section 8 - Recommender Systems
  • [ ] Section 9 - Transfer Learning for Computer Vision
  • [ ] Section 10 - GANs
  • [ ] Section 11 - Deep Reinforcement Learning (Theory)
  • [ ] Section 12 - Stock Trading Project with DL
  • [ ] Section 13: Advanced Tensorflow Usage
  • [ ] Section 14: Low - Level Tensorflow
  • [ ] Section 15: In-Depth: Loss Functions
  • [ ] Section 16: In-Depth: Gradient Descent
  • [ ] Section 17 - 21: Misc

DeepLearning.AI - Course 04.Sequences, Time Series and Predictions in Tensorflow

  • [x] Week 01 - Sequences and Prediction
  • [ ] Week 02 - Deep Neural Networks for Time Series
  • [ ] Week 03 - Recurrent Neural Networks for Time Series
  • [ ] Week 04 - Real-world time series data

DeepLearning.AI - Course 03.Netural Language Processing in Tensorflow

  • [x] Week 01 - Sentiment in Text
  • [x] Week 02 - Word Embeddings
  • [x] Week 03 - Sequence Models
  • [x] Week 04 - Sequence Models and Literature

DeepLearning.AI - Course 02.Convolutional Neural Networks in TensorFlow

  • [x] Week 01 - Exploring a Larger Dataset
  • [x] Week 02 - Augmentation: A technique to avoid overfitting
  • [x] Week 03 - Transfer Learning
  • [x] Week 04 - Multiclass Classification

DeepLearning.AI - Course 01.Introduction to TensorFlow for Artificial Intelligence, Machine Learning, and Deep Learning

  • [x] Week 01 - A New Programming Paradigm
  • [x] Week 02 - Introduction to Computer Vision
  • [x] Week 03 - Enhancing Vision with CNN
  • [x] Week 04 - Using Real-world images

Deep Learning TensorFlow Developer Certificate - ZTM - IN PROGRESS

  • [x] 01. Introduction
  • [x] 02. Deep Learning and Tensorflow Fundamentals
  • [ ] 03. Neural Network Regression with Tensorflow
  • [ ] 04. Neural Network Classification with Tensorflow
  • [ ] 05. Computer Vision and Convolutional Neural Networks in Tensorflow
  • [ ] 06. Transfer Learning - Feature Extraction
  • [ ] 07. Transfer Learning - Fine Tuning
  • [ ] 08. Transfer Learning - Scaling up
  • [ ] 09. Milestone Project 1 - Food Vision Big
  • [ ] 10. NLP Fundamentals in Tensorflow
  • [ ] 11. Milestone Project 2 - SkimLit
  • [ ] 12. Timseries Fundamentals + Milestone Project 3 - BitPredict
  • [ ] 13. Passing Tensorflow Certificate Exam
  • [ ] 15. Appendix - Machine Learning Primer
  • [ ] 16. Appendix - Machine Learning Framework
  • [ ] 14, 17-19. Misc

Complete Tensorflow 2 and Keras Deep Learning Bootcamp - JP

  • NumPy Crash Course
  • Pandas Crash Course
  • Visualization Crash Course
  • Basic Artifical Neural Networks - ANNs
    • Basic Keras Project
    • Predict House Price for House Sales in King County, USA - Regression Project
    • Breast Cancer Wisconsin (Diagnostic) - Classification Project
    • Loan Default Prediction Prediction - Classification Project
    • Tensorboard
  • Convolutional Neural Networks - CNNs
    • Convolutional Neural Networks for Image Classification - MNIST data
    • Convolutional Neural Networks for Image Classification - CIFAR 10 data
    • Convolutional Neural Networks for Image Classification - Real Image - Malaria Detection Project
    • Convolutional Neural Networks for Image Classification - Fashion MNIST Data Project
  • Recurrent Neural Networks - RNNs
    • Sinewave Example
    • RNN Example for Time Series - Advance Monthly Sales for Retail and Food Services
    • RNN Forzen Dessert Sales Forecasting with LSTM
    • Multivariate Time Series with RNN
  • Natural Language Processing - NLP
    • Generating Text with RNNs - Shakespeare
  • Auto Encoders
    • AutoEncoders for Dimensionality Reduction
    • AutoEncoders on Image Data - Dimensionality Reduction & Noise Removal
    • Average Eating Habits of UK Countries - Autoencoders
  • Generative Adverserial Networks - GANs
    • GANs - Generative Adverserial Networks with Dense Layers
    • DCGANs - Deep Convolutional Generative Adverserial Networks
  • Deployment

Machine Learning & Data Science Masterclass - JP

  • new track 2021 Python for Machine Learning & Data Science Masterclass
  • Python Crash Course
  • NumPy
  • Pandas
  • Matplotlib
  • Seaborn Data Visualizations
  • Data Analysis and Data Visualization Capstone Project
    • Fandango Vs other sites movie ratings
  • Linear Regression Models
  • Feature Engineering and Data Preparation
  • Cross Validation, Grid Search and Linear Regression Project
    • Ames Housing Data Project
  • Logistic Regression Models
    • Heart Disease Detection Project
  • KNN - K Nearest Neighbors
    • Sona Data - Detecting Rock or Mine Project
  • SVM - Support Vector Machines
    • Wine Fraud Detection Project
  • Tree Based Methods - Decision Tree Learning
  • Random Forests
  • Boosting Methods
    • Mushroom Edible or Poisonous Prediction Project - with AdaBoost
    • Mushroom Edible or Poisonous Prediction Project - with Gradient Boosting
  • Supervised Learning Capstone Project - Cohort Analysis & Customer Churn Predictions
  • Naive Bayes Classification and Natural Language Processing (Supervised Learning)
    • NLP - Feature Extraction
    • Flight Tweets Sentiment Analysis - Classification
    • Moview Reivew Sentiment Analysis - Classification
  • K Means Clustering (Unsupervised Learning)
    • Color Quantization
    • CIA Country Analysis and Clustering
  • Hierarchical Clustering (Unsupervised Learning)
    • Cars Model Clustering
  • DBSCAN (Unsupervised Learning)
    • DBSCAN - Theory and Inituation
    • Hyperparameter Tuning
    • Wholesale Customers - Clustering
  • Principal Component Analysis (Unsupervised Learning)
    • PCA Manual Implementation
    • PCA with sklearn
    • PCA - Handwritten Digits classifications
  • Model Deployment
    • Serving model as API with Flask

Complete Machine Learning and Data Science - Zero to Mastery

  • Data Analysis with Pandas
  • Data Analysis with NumPy
  • Linear Regression with Polyfit - Data 36
  • Matplotlib - Data Visualizations
  • Scikit-learn - Creating Machine Learning Models
  • Milestone Project - Supervised Learning (Classification)- Heart Disease Detection
  • Milestone Project - Supervised Learning (Regression)- Bulldozer Sales Price Prediction
  • Deep Learning Project - Dog breed predictions

ML - Machine Learning & Data Science A-Z Hands-on Python - NS

  • [x] 03. Preprocessing
  • [x] 04. Machine Learning Types
  • [x] 05. Supervised Learning - Classification
  • [x] 06. Supervised Learning - Regression
  • [x] 07. Unsupervised Learning - Clustering
  • [x] 08. Hyper Parameters Optimization

Data Science and Machine Learning Bootcamp

  • Python Crash Course
  • Python for Data Analysis - NumPy
  • Python for Data Analysis - Pandas
  • Python for Data Visualization - Matplotlib
  • Python for Data Visualization - Seaborn
  • Pandas Built In Data Visualization
  • Visualization with Plotly and Cufflinks
  • Data Capstone Projects
    • 911 Calls - Data Capstone Project
  • Linear Regression
    • Ecommerce Project
  • Logistic Regression
    • 20Advertisement Project
  • K Nearest Neighbors (KNN)
    • 20Anonymized Data Project
  • Decision Tree and Random Forest
    • Loan Prediction Project
  • Support Vector Machine (SVM)
    • Iris Flower Project
  • K Means Clustering
  • Principal Component Analysis
  • Recommender Systems
  • Natural Language Processing
    • Yelp Reviews Classification
  • Neural Nets and Deep Learning
    • Regression Project - Predict House Price for House Sales in King County, USA
    • Classification Project - Breast Cancer Wisconsin (Diagnostic)
    • Final Project - Classification - Loan Default Prediction
    • TensorBoard
  • Big Data and Spark with Python
  • SciPy

Complete Data Science Bootcamp - 365

  • [x] Part 1 - The Field of Data Science
  • [x] Part 2 - Probability
  • [ ] Part 3 - Statistics (Descriptive & Inferential)
  • [x] Part 4 - Python
  • [ ] Part 5 - Advanced Statistical Methods in Python / Machine Learning in Python
  • [x] Part 6 - Mathematics
  • [ ] Part 7 - Deep Learning
  • [ ] Software Integration
  • [ ] Case Study - Absenteeism

Books

Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow (in progress)

  • [x] The Fundamentals of Machine Learning
  • [x] The Machine Learning Landscape
  • [x] End-to-End Machine Learning Project
  • [x] Classification
  • [ ] Training Models

The Hundreded page - Machine Learning book

  • [x] Introduction
  • [x] Notation and Definitions
  • [x] Fundamental Algorithms
  • [x] Anatomy of a Learning Algorithm
  • [x] Basic Practice
  • [ ] Neural Networks and Deep Learning
  • [ ] Problems and Solutions
  • [ ] Advanced Practice
  • [ ] Unsupervised Learning
  • [ ] Unsupervised Learning - in-depth material
  • [ ] Other Forms of Learning
  • [ ] Conclusion

Advancing Machine Learning & Data Science Journey - (In Progress)

To skill up my ML & DS related skills in specific areas and topics:

Applied Machine Learning - Ensemble Learning

  • Project: Titanic dataset
  • 01.ML Basic
  • 02.Preparing the Data
  • 03.Ensemble Learning
  • 04.Boosting
  • 05.Bagging
  • 06.Stacking
  • 07.Evaluation and Selection of Models

Applied Machine Learning - Feature Engineering

  • Project: Titanic dataset
  • 01.ML Basic
  • 02.Intro to Feature Engineering
  • 03.Explore Data
  • 04.Create and Clean Features
  • 05.Prepare Features for Modelling
  • 06.Compare and Evaluate Models

Applied Machine Learning - Algorithms

  • Project: Titanic dataset
  • 01.Review of Foundation
  • 02.Logistic Regression
  • 03.Support Vector Machine
  • 04.Multi-layer Perceptron
  • 05.Random Forest
  • 06.Boosting
  • 07.Final Model Selection and Evaluation

Applied Machine Learning - Foundation

  • Project: Titanic dataset
  • 01.ML Basic
  • 02.Exploratory Data Analysis and Data Cleaning
  • 03.Evaluation - Measuring Success
  • 04.Optimizing a Model
  • 05.End to End Pipeline

ML - Mistakes to avoid in Machine Learning

  • [x] Assuming Data is good to go
  • [x] Neglecting to consult subject matter experts
  • [x] Overtiffing your models
  • [x] Not standardizing your data
  • [x] Focusing on Wrong Factors
  • [x] Data Leakage
  • [x] Forgetting traditional statistics tools
  • [x] Assuming Deployment is a breeze
  • [x] Assuming Machine Learning is the answer
  • [x] Developing in a silo
  • [x] Not treating for imbalanced sampling
  • [x] Interpreting your coefficients without properly treating for multicollinearity
  • [x] Evaluating by accuracy alone
  • [x] Giving overly technical presentations

Deep Learning , Machine Learning, AI & Data Science

  • [x] Deep Learning - Natural Language Processing with TensorFlow
  • [ ] Deep Learning - Face Recognition
  • [x] Deep Learning - Image Recognition
  • [x] Deep Learning - Buliding Deep Learning Applications with Keras 2.0
  • [x] Applied Machine Learning - Ensemble Learning
  • [x] Applied Machine Learning - Feature Engineering
  • [x] Applied Machine Learning - Algorithms
  • [x] Applied Machine Learning - Foundation
  • [x] Machine Learning with Python - 03_k-Means Clustering
  • [x] Machine Learning with Python - 02_Decision Trees
  • [x] Machine Learning with Python - 01_Foundations
  • [x] ML - Mistakes to avoid in Machine Learning
  • [x] ML - Classification Modelling with Iris flowers
  • [x] Data Science A-Z Modeling
  • [x] Designing for Neural Networks and AI Interfaces
  • [x] Introduction to GPT-3: A Leap in Artificial Intelligence

Data Analysis, Manipulation & Data Visualization

  • [ ] DA & DV - Python Data Analysis & Visualization Masterclass
  • [x] Pandas - Pandas Code Challenges
  • [x] Pandas - Advanced Pandas
  • [x] DV - Data Visualizations with Plotly
  • [x] DA - Data Analysis with Pandas and Python - BP
  • [x] DA - Python Data Playbook - Cleaning Data
  • [x] Pandas - Pandas Playbook - Manipulating Data
  • [x] More Python Data Tools - Microsoft

Apache Spark & PySpark

  • [x] Intro to Spark SQL and DataFrames
  • [x] Apache Spark Essential Training
  • [ ] Spark for Machine Learning & AI
  • [x] Apache PySpark by Example
  • [x] Apache Spark Deep Learning Essential Training

Data Scientist Reading Materials

  • Supervised Learning
    • [x] Lesson 01: Machine Learning Bird's Eye View
    • [ ] Lesson 02: Linear Regression
    • [ ] Lesson 03: Perceptron Algorithm
    • [ ] Lesson 04: Decision Trees
    • [ ] Lesson 05: Naive Bayes
    • [ ] Lesson 06: Support Vector Machines
    • [ ] Lesson 07: Ensemble Methods
    • [x] Lesson 08: Model Evaluation Metrics
    • [ ] Lesson 09: Training and Tuning
    • [ ] Lesson 10: Finding Donors Project

Kaggle Courses

  • [x] Python
  • [x] Pandas
  • [x] Data Cleaning
  • [x] Introduction to Machine Learning
  • [x] Machine Learning Intermediate
  • [ ] Feature Engineering
  • [ ] Machine Learning Explaniability
  • [x] Data Visualization
  • [ ] Intro to Deep Learning
  • [ ] Intro to Game AI and Reinforcement Learning
  • [ ] Natural Language Processing
  • [ ] Micro-challenges
  • [ ] Computer Vision
  • [ ] Intro to SQL
  • [ ] Advanced SQL

Google ML courses

  • [ ] ML Crash Course
  • [ ] Problem Framing
  • [ ] Data Prep
  • [ ] Clustering
  • [ ] Recommendation
  • [ ] Testing and Debugging
  • [ ] GANs

Probability & Statistics (in progress)

  • Linear Regression Analysis
  • Multi Regression Analysis
  • Pratical Statistics
    • Admission Case Study with Python (Simpson's Paradox)
    • Simulating Coin Flips & Probability
    • Stimulating multiple Coin Flips & Bionmial Distribution
    • Cancer Test Results
    • Conditional Probability & Bayes Rules
  • Excel Data Manipulation, Analysis and Visualization

Data Science Math Skills - Duke University

Topics include:

  • Set theory, including Venn diagrams
  • Properties of the real number line
  • etc

License

This project is licensed under the MIT License - see the LICENSE.md file for details