data-science-learning icon indicating copy to clipboard operation
data-science-learning copied to clipboard

πŸ“Š All of courses, assignments, exercises, mini-projects and books that I've done so far in the process of learning by myself Machine Learning and Data Science.

πŸ“Š data-science-learning

The list of things I've finished so far on the way of learning by myself Machine Learning and Data Science.

πŸ”₯ Projects

  • [x] Setting up a cafΓ© in Ho Chi Minh City β€” find a best place to setting up a new business β€” article β€” source.
  • [x] Titanic: Machine Learning from Disaster (from Kaggle) β€” predicts which passengers survived the Titanic shipwreck β€” source.

I also do some mini-projects for understanding the concepts. You can find the html files (exported from the corresponding Jupyter Notebook files) and "Open in Colab" files for below mini projects here.

🎲 Tasks

  • [x] Anomaly Detection. β€” my note
  • [x] Data Aggregation β€” my note
  • [x] Data Overview. β€” my note
  • [x] Data Visualization.
  • [x] Model evaluation.
  • [x] Preprocessing (texts, images, dates & times, structured data). β€” my note
  • [x] Testing. β€” my note
  • [x] Web Scraping.

🐍 Programming Languages

  • [x] GraphQL β€” an open-source data query and manipulation language for APIs, and a runtime for fulfilling queries with existing data.
  • [x] Python β€” an interpreted, high-level, general-purpose programming language β€” my note.
  • [x] R β€” a programming language and free software environment for statistical computing and graphics supported by the R Foundation for Statistical Computing.
  • [ ] Scala β€” a general-purpose programming language providing support for functional programming and a strong static type system.
  • [x] SQL β€” a domain-specific language used in programming and designed for managing data held in a relational database management system, or for stream processing in a relational data stream management system.

βš™οΈ Frameworks & Platforms

  • [x] Apache Airflow β€” my note
  • [x] Docker β€” a set of platform as a service products that use OS-level virtualization to deliver software in packages called containers β€” my note
  • [x] Google Colab β€” a free cloud service, based on Jupyter Notebooks for machine-learning education and research β€” my note.
  • [ ] Google Kubernetes
  • [ ] Hadoop β€” a collection of open-source software utilities that facilitate using a network of many computers to solve problems involving massive amounts of data and computation.
  • [x] Kaggle β€” an online community of data scientists and machine learners, owned by Google.
  • [x] PostgreSQL (Postgres) β€” a free and open-source relational database management system emphasizing extensibility and technical standards compliance.
  • [ ] Spark β€” an open-source distributed general-purpose cluster-computing framework.

βš’οΈ Tools

  • [x] Bash β€” my note
  • [x] Git β€” a distributed version-control system for tracking changes in source code during software development β€” my note.
  • [x] Markdown β€” a lightweight markup language with plain text formatting syntax β€” my note.
  • [x] Jupyter Notebook β€” an open-source web application that allows you to create and share documents that contain live code, equations, visualizations and narrative text β€” my note.
  • [x] Trello β€” a web-based Kanban-style list-making application.

πŸ“š Libraries & Frameworks

The "ticked" libraries don't mean that I've known/understand whole of them (but I can easily use them with their documentation)!

  • [ ] D3js β€” a JavaScript library for producing dynamic, interactive data visualizations in web browsers.
  • [x] Keras β€” an open-source neural-network library written in Python.
  • [x] Matplotlib β€” a plotting library for the Python programming language and its numerical mathematics extension NumPy. β€” my note
  • [x] Numpy β€” a library for the Python programming language, adding support for large, multi-dimensional arrays and matrices, along with a large collection of high-level mathematical functions to operate on these arrays. β€” my note
  • [ ] OpenCV β€” a library of programming functions mainly aimed at real-time computer vision.
  • [x] Pandas β€” a software library written for the Python programming language for data manipulation and analysis. -- my note
  • [ ] Plotly -- the front-end for ML and data science models.
  • [x] PyTorch -- my note
  • [x] Seaborn β€” a Python data visualization library based on matplotlib.
  • [x] Scikit-learn β€” a free software machine learning library for the Python programming language.
  • [x] TensorFlow β€” a free and open-source software library for dataflow and differentiable programming across a range of tasks.

πŸ‘¨β€πŸ« Courses

The "non-checked" courses are under the way to be finished!

πŸ“– Books

The "non-checked" books are under the way to be finished!

πŸ€– Github's repositories

🌏 Other resources

  • Papers With Code β€” a free and open resource with Machine Learning papers, code and evaluation tables.
  • Chris Albon's notes β€” Notes On Using Data Science & Artificial Intelligence To Fight For Something That Matters.
  • Seeing Theory β€” A visual introduction to probabilities and statistics.
  • Collection of useful articles for understanding concepts in ML, AI and DS.

The descriptions of terms in this site are borrowed from Wikipedia.