datasci
datasci copied to clipboard
Self-study plan to achieve mastery in data science
Zero to Mastery in Data Science.
Study plan overview
- Module 0 - Elementary to Highschool Math
- Module 1 - College Math I (Calculus)
- Module 2 - College Math II (Linear Algebra)
- Module 3 - College Math III (Discrete Math)
- Module 4 - College Math IV (Probability and Statistics)
- Module 5 - Computation and Algorithms
- Module 6 - Artificial Intelligence and Machine Learning
- Module 7 - Deep Learning
- Module 8 - Data Mining and Recommenders
- Module 9 - NLP and Computer Vision
Module 0 - Elementary to Highschool math
Not everyone was fortunate enough to have a good start with math growing up. The goal of this module is to level the playing field - by the end of module 0 you should feel as though you went to a highschool with world class teachers and finished top of your math class.
If you consider yourself bad at math, or if you "hated math" in school, then the best advice is to start at the lowest level you can. Start at pre-school math if you have to, but find the level of math where you can easily follow. Resist skipping ahead and go through the program level by level. Do not advance to the next level until you have mastery of the current level. If the current level is too hard, go back to an earlier level. I've linked some courses here that are probably a good for most, but you can find even more elementary courses on khanacademy if you need.
- [ ] Khan - Pre algebra 67%
- [ ] Khan - Algebra basics 88%
- [ ] Khan - Algebra I 74%
- [ ] Khan - Highschool Geometry - 81%
- [ ] Khan - Algebra II 71%
- [ ] Khan - Trigonometry - 79%
- [ ] Khan - College Algebra 82%
- [ ] Khan - Statistics and Probability 30%
- [ ] Khan - Pre College Statistics
- [ ] Khan - College Statistics
- [ ] Khan - Pre-Calculus
- [ ] Khan - Calculus 1
- [ ] Khan - Calculus 2
- [ ] Khan - Multivariable Calculus
- [ ] Khan - Differential Calculus
- [ ] Khan - Integral Calculus
- [ ] Khan - Pre College Calculus
- [ ] Khan - College Calculus AB 4%
- [ ] Khan - College Calculus BC
Supplementary Material
- [ ] π The Joy of X
- [ ] 3B1B - Lockdown Math
Module 1 - College Math I (Calculus)
Supplementary Material
- Calculus, Better Explained
- Essence of Calculus
- Engineering Math
- Intro to Calculus with Derivatives
- Coursera - Introduction to Complex Analysis
- Coursera - Mathematics for Machine Learning: Multivariate Calculus
- Prof. Leonard - Calculus I
- MIT - Differential Equations
- Khan - Differential Equations
Module 2 - College Math II (Linear Algebra)
- [ ] Khan - Linear Algebra
- [ ] Coursera - Mathematics for Machine Learning: Linear Algebra
- [ ] Coursera - Mathematics for Machine Learning: Principle Component Analysis
- [ ] Fast AI - Computational Linear Algebra
- [ ] MIT - Linear Algebra
Required Reading
- [ ] π Linear Algebra and its Applications
- [ ] π Coding the Matrix
Supplementary Material
- Matrix Calculus for Deep Learning
- Graphical linear algebra
- Essence of Linear Algebra
- Brown University - Coding the Matrix
- Udacity - Linear Algebra Refresher Course
- https://open.math.uwaterloo.ca/
- https://www.youtube.com/playlist?list=PL44B6B54CBF6A72DF
Module 3 - College Math III (Discrete Math)
3.1 Proofs and Logic
Proofs, Set theory, propositional logic, induction, invariants, state-machines
- [ ] Coursera - What is a Proof?
- [ ] MIT - Mathematics for Computer Science (2015): Unit 1
- [ ] MIT - Mathematics for Computer Science (2010): Weeks 1,2,3
- [ ] π How to Prove It
- [ ] π Book of Proof
- https://www.logicmatters.net/resources/pdfs/TeachYourselfLogic2017.pdf
3.2 Number Theory
Number theory is fundamental in reasoning about numbers as discrete mathematic structures with applications in cryptography and efficient numerical computation.
By the end of this sub-module you should be very confident proving and reasoning about concepts including: divisibility, bezouts identity, modular arithmetic, eulers totient theorem, fermats little theorem, integer factorization, diophantine equations, the fundemental theorem of arithmetic, chinese remainder theorem, RSA and the discrete logarithm problem.
- [ ] Coursera - Number Theory and Cryptography
- [ ] MIT - Mathematics for Computer Science (2010) - Number Theory I and II
- [ ] MIT - Mathematics for Computer Science (2015) - GCDs, Congruences, Euler's Theorem, and RSA
Problem Sets
- [ ] MIT - Mathematics for Computer Science (2010): Recitation 4
- [ ] MIT - Mathematics for Computer Science (2010): Recitation 5
- [ ] MIT - Mathematics for Computer Science (2010): Assignment 3
Optional Supplementary Material
- [ ] Coursera - Classical Cryptosystems and Core Concepts
- [ ] Coursera - Mathematical Foundations for Cryptography
3.3 Combinatorics
Combinatorics is a vital skill in reasoning about the size of finite sets.
- [ ] Coursera - Combinatorics and Probability
- [ ] Coursera - Introduction to Enumerative Combinatorics
- [ ] MIT - Mathematics for Computer Science (2010) - Counting Rules I and II
- [ ] MIT - Mathematics for Computer Science (2015) - Counting
Problem Sets
- [ ] MIT - Mathematics for Computer Science (2010): Recitation 15
- [ ] MIT - Mathematics for Computer Science (2010): Recitation 16
- [ ] MIT - Mathematics for Computer Science (2010): Assignment 9
3.4 Graph Theory
- [ ] Coursera - Introduction to Graph Theory
- [ ] Coursera - Solving the Delivery Problem
- [ ] π Introduction to Graph Theory
- [ ] Sarada Herke - Graph Theory Course
- [ ] http://courses.csail.mit.edu/6.889/fall11/lectures/
Discrete Math Supplementary Material
-
Visual Group Theory computer-science/6-042j-mathematics-for-computer-science-fall-2010/)
-
https://www.coursera.org/learn/discrete-mathematics#%20
-
https://www.youtube.com/playlist?list=PLZzHxk_TPOStgPtqRZ6KzmkUQBQ8TSWVX
-
π Concrete Mathematics
Module 4 - College Math IV (Probability and Statistics)
4.1 Probability
- [ ] Edx/Harvard - Probability from the Ground Up
- [ ] EdX/Harvard STAT110 - Introduction to Probability
- [ ] EdX/MIT 6.431 - Introduction to Probability - The Science of Uncertainty
4.2 Statistics
Module 5 - Computation and Algorithms
Algorithms
- [ ] Coursera - Divide and Conquer, Sorting and Searching, and Randomized Algorithms
- [ ] Coursera - Graph Search, Shortest Path, and Data Structures
- [ ] Coursera - Greedy Algorithms, Minimum Spanning Trees, and Dynamic Programming
- [ ] Coursera - Shortest Paths Revisited, NP-Complete Problems
Resources
- [ ] π Grokking Algorithms
- [ ] π Algorithms to Live By
- [ ] π Introduction to Algorithms (CLRS)
- [ ] π The Algorithm Design Manual
- [ ] π Algorithms (Dasgupta)
- [ ] π Algorithm Design (Tardos and Kleinberg)
- [ ] π Algorithms (Sedgewick)
- Khan Algorithms
- MIT 6.006 - Introduction to Algorithms
- Intro to Algorithms
- Algorithmic Thinking I
- Algorithmic Thinking II
- https://www.youtube.com/watch?v=T_WffoMAaMA
- https://www.coursera.org/specializations/data-structures-algorithms
- https://www.youtube.com/user/mycodeschool
- http://www3.cs.stonybrook.edu/~algorith/
- https://www.youtube.com/watch?v=ufj5_bppBsA&list=PLFDnELG9dpVxQCxuD-9BSy2E7BWY3t5Sm&index=7
- https://www.youtube.com/user/mikeysambol/playlists
- https://ocw.mit.edu/courses/electrical-engineering-and-computer-science/6-006-introduction-to-algorithms-fall-2011/
- Programming Conversations
- Efficient Programming with Components
- Four Algorithmic Journeys
- Computer Science: Algorithms, Theory, and Machines
- Data Structures & Algorithms Specialization
- Approximation Algorithms I & II
- http://jeffe.cs.illinois.edu/teaching/algorithms/?#book
Information Theory
Python and Computation and Data
-
https://www.edx.org/course/introduction-computer-science-mitx-6-00-1x-10
-
https://www.edx.org/course/introduction-computational-thinking-data-mitx-6-00-2x-5
Module 5.5 - Databases, and Computer Architecture
- [ ] Coursera - Data Systems Specialization
- [ ] Coursera - Data Visualization Specialization
- [ ] Coursera - Computer Architecture
- [ ] MIT Computer System Engineering
- [ ] MIT Information and Entropy
- [ ] Coursera - Computer Science Algs, Theory, Machines
Supplementary
- https://www.coursera.org/learn/introduction-mongodb
- https://university.mongodb.com/
- https://www.khanacademy.org/computing/computer-science/informationtheory
- https://www.youtube.com/playlist?list=PLSE8ODhjZXjbisIGOepfnlbfxeH7TW-8O
- https://www.brianstorti.com/replication/
Module 6 - Artificial Intelligence and Machine Learning
https://www.coursera.org/specializations/aml
Artificial Intelligence
- https://ocw.mit.edu/courses/electrical-engineering-and-computer-science/6-868j-the-society-of-mind-fall-2011/video-lectures/
- https://www.youtube.com/watch?feature=player_embedded&v=J6PBD-wNEDs
- http://ai.berkeley.edu/lecture_videos.html
- https://www.udacity.com/course/artificial-intelligence-for-robotics--cs373
- http://aiplaybook.a16z.com/
- https://ocw.mit.edu/courses/electrical-engineering-and-computer-science/6-034-artificial-intelligence-fall-2010/lecture-videos/
- http://rll.berkeley.edu/deeprlcourse/
Machine Learning
Machine Learning Specialization by University of Washington on Coursera
- [ ] Machine Learning Foundations: A Case Study Approach
- [ ] Machine Learning: Regression
- [ ] Machine Learning: Classification
- [ ] Machine Learning: Clustering & Retrieval
- https://www.analyticsvidhya.com/blog/2015/07/top-youtube-videos-machine-learning-neural-network-deep-learning/
- Statistical Machine Learning 10-702/36-702
- https://www.udacity.com/ai
- https://www.udacity.com/drive
- https://www.udacity.com/course/machine-learning-engineer-nanodegree--nd009
- https://www.edx.org/xseries/data-science-engineering-apacher-sparktm
- https://www.coursera.org/specializations/data-mining
- https://www.coursera.org/specializations/machine-learning
- http://web.stanford.edu/class/cs20si/syllabus.html
- https://work.caltech.edu/telecourse.html
- https://work.caltech.edu/telecourse.html
- https://www.youtube.com/watch?v=bxe2T-V8XRs
- https://www.youtube.com/watch?v=UVwwYZMFocg&list=PLiaHhY2iBX9ihLasvE8BKnS2Xg8AhY6iV&index=8
- https://ocw.mit.edu/courses/electrical-engineering-and-computer-science/6-868j-the-society-of-mind-fall-2011/video-lectures/
- https://www.coursera.org/specializations/gcp-data-machine-learning
Module 7 - Deep Learning
Deep Learning by deeplearning.ai on Coursera
- [ ] Neural Networks and Deep Learning
- [ ] Improving Deep Neural Networks: Hyperparameter Tuning, Regularization, and Optimization
- [ ] Structuring Machine Learning Projects
- [ ] Convolutional Neural Networks
- [ ] Sequence Models
Goals:
- [ ] different activation functions (sigmoid/tanh/relu)
- [ ] different cost functions
- [ ] with and without bias units
- [ ] classification and regression problems
- [ ] text / binary / image / recommenders
- [ ] batch vs stochastic
- [ ] JS, Python, PHP, Matlab, TensorFlow, SciKitLearn
- [ ] create visualizations and blog explanations
- [ ] Audit best courses / books
- http://explained.ai/matrix-calculus/index.html
- Practical Deep Learning For Coders
- https://classroom.udacity.com/courses/ud730
- http://neuralnetworksanddeeplearning.com/
- http://course.fast.ai/
- http://www.deeplearningbook.org/
- http://cs231n.github.io/ + https://www.youtube.com/playlist?list=PLlJy-eBtNFt6EuMxFYRiNRS07MCWN5UIA
- http://neuralnetworksanddeeplearning.com/
- https://www.youtube.com/playlist?list=PL6Xpj9I5qXYEcOhn7TqghAJ6NAPrNmUBH
- http://rll.berkeley.edu/deeprlcourse/
- http://rll.berkeley.edu/deeprlcourse/#lecture-videos
- http://rll.berkeley.edu/deeprlcourse/
- http://introtodeeplearning.com/index.html
- https://www.youtube.com/watch?v=21EiKfQYZXc&app=desktop
- https://courses.csail.mit.edu/6.042/spring17/mcs.pdf
- http://yerevann.com/a-guide-to-deep-learning/
- https://www.coursera.org/learn/neural-networks
- https://www.youtube.com/playlist?list=PLE6Wd9FR--EfW8dtjAuPoTuPcqmOV53Fu
- https://cloud.google.com/blog/big-data/2017/01/learn-tensorflow-and-deep-learning-without-a-phd
- https://www.udacity.com/course/deep-learning--ud730
- http://nbviewer.jupyter.org/github/domluna/labs/blob/master/Build%20Your%20Own%20TensorFlow.ipynb
- https://goc.vivint.com/problems/mlc
- http://blog.floydhub.com/coding-the-history-of-deep-learning/
- https://www.udacity.com/course/deep-learning--ud730
- https://stats385.github.io/
- https://p.migdal.pl/interactive-machine-learning-list/
- https://scrimba.com/g/gneuralnetworks
Module 8 - Data Mining and Recommenders
-
https://www.coursera.org/specializations/recommender-systems
-
https://www.coursera.org/specializations/data-mining
-
https://www.coursera.org/specializations/big-data
-
https://nlp.stanford.edu/IR-book/information-retrieval-book.html
-
https://nlp.stanford.edu/IR-book/information-retrieval.html
-
https://www.coursera.org/specializations/data-warehousing
-
https://www.coursera.org/specializations/gcp-data-machine-learning
-
https://www.coursera.org/specializations/data-science
-
https://www.coursera.org/learn/scala-spark-big-data
Module 9 - NLP and Computer Vision
NLP
- https://github.com/oxford-cs-deepnlp-2017/lectures
- https://www.youtube.com/watch?v=OQQ-W_63UgQ&list=PL3FW7Lu3i5Jsnh1rnUwq_TcylNr7EkRe6
Image and Computer Vision
- https://www.coursera.org/learn/digital/home/welcome
- http://cs231n.stanford.edu/syllabus.html
- https://www.udacity.com/course/interactive-3d-graphics--cs291
- https://www.youtube.com/watch?v=01YSK5gIEYQ&list=PL_w_qWAQZtAZhtzPI5pkAtcUVgmzdAP8g
Electives
- http://cagd.cs.byu.edu/~557/text/ch1.pdf
- https://www.coursera.org/learn/data-driven-astronomy
- https://www.coursera.org/specializations/genomic-data-science
- https://www.coursera.org/learn/data-genes-medicine
- https://www.coursera.org/specializations/systems-biology
- https://www.coursera.org/specializations/networking-basics
- https://www.coursera.org/learn/neurohacking
- https://www.youtube.com/playlist?list=PLUl4u3cNGP62K2DjQLRxDNRi0z2IRWnNh
- Raft/Paxos CAP Theorem / Redundancy
Resources
- https://www.youtube.com/playlist?list=PLoROMvodv4rMWw6rRoeSpkiseTHzWj6vu&disable_polymer=true
- https://github.com/open-source-society/data-science
- https://unsupervisedmethods.com/over-150-of-the-best-machine-learning-nlp-and-python-tutorials-ive-found-ffce2939bd78
- http://www.scipy-lectures.org/
- https://github.com/mr-mig/every-programmer-should-know
- https://online-learning.harvard.edu/series/professional-certificate-data-science
- computational geometry https://www.youtube.com/watch?v=rho8QqiHOe4
- kaggle school https://www.kaggle.com/learn/overview
- MIT self driving https://selfdrivingcars.mit.edu/
- MIT GAI https://agi.mit.edu/
- https://ai.google/education
- https://mlcourse.ai/
- https://mml-book.github.io/
- https://github.com/lexfridman/mit-deep-learning/blob/master/README.md#mit-deep-learning
- http://d2l.ai/chapter_introduction/index.html
- https://www.jgoertler.com/visual-exploration-gaussian-processes/
- https://lectures.quantecon.org/py/short_path.html
- http://webdam.inria.fr/Alice/ [databases]
- https://hacker-tools.github.io/
Reading List
- [ ] The Art of Unix Programming
- [ ] The C programming language
- [ ] GΓΆdel, Escher, Bach: An Eternal Golden Braid
- [ ] Deep Learning (Goodfellow, Bengio, Courville)
- [ ] Grokking Deep Learing
- [ ] Grokking Deep Reinforcement Learning
- [ ] Compilers: Principles, Techniques, and Tools (Dragon book)
- [ ] Code
- [ ] The elements of statistical learning
- [ ] The structure and intepretation of computer programs
- [ ] Hackers Delight
- [ ] Concrete Mathematics
- [ ] The Art of Computer Programming
- [ ] Artificial Intelligence: A Modern Approach
- [ ] https://blog.ycombinator.com/learning-math-for-machine-learning/