ElizaLo/Data-Science: Projects and awesome list for all Data Science fie...

🔄 Constantly updated. Subscribe not to miss anything.

[ ] For Natural Language Processing (NLU = NLP + NLG) please check Natural Language Processing repository.

[ ] For Machine Learning algorithms please check Machine Learning repository.

[ ] For Deep Learning algorithms please check Deep Learning repository.

[ ] For Computer Vision please check Computer Vision repository.

Data Science Tasks

Folders with all materials for specific task/domain

Educational Platforms

OpenEDU

University courses 👩‍🎓

Title	Description
MIT OpenCourseWare	MIT OpenCourseWare on YouTube

Julia language

Title	Description
Introduction to Computational Thinking	MIT 18.S191 aka 6.S083 aka 22.S092, Fall 2020 Spring 2021 / MIT 18.S191/6.S083/22.S092

Time Series

Title	Description
MIT 18.S096 Topics in Mathematics w Applications in Finance	The purpose of the class is to expose undergraduate and graduate students to the mathematical concepts and techniques used in the financial industry. Mathematics lectures are mixed with lectures illustrating the corresponding application in the financial industry. MIT mathematicians teach the mathematics part while industry professionals give the lectures on applications in finance. Video lectures

Title

Description

MIT 18.S096 Topics in Mathematics w Applications in Finance

The purpose of the class is to expose undergraduate and graduate students to the mathematical concepts and techniques used in the financial industry. Mathematics lectures are mixed with lectures illustrating the corresponding application in the financial industry. MIT mathematicians teach the mathematics part while industry professionals give the lectures on applications in finance.

Video lectures

Online courses

Специализация Наука о данных для руководителей
Machine Learning Foundations

Machine Learning Foundations: Linear Algebra, Calculus, Statistics & Computer Science

GitHub Repositories :octocat:

Title	Description
Data Science for Beginners - A Curriculum	Azure Cloud Advocates at Microsoft are pleased to offer a 10-week, 20-lesson curriculum all about Data Science. Each lesson includes pre-lesson and post-lesson quizzes, written instructions to complete the lesson, a solution, and an assignment. Our project-based pedagogy allows you to learn while building, a proven way for new skills to 'stick'.
Machine Learning for Beginners - A Curriculum	Azure Cloud Advocates at Microsoft are pleased to offer a 12-week, 26-lesson curriculum all about Machine Learning. In this curriculum, you will learn about what is sometimes called classic machine learning, using primarily Scikit-learn as a library and avoiding deep learning, which is covered in our forthcoming 'AI for Beginners' curriculum.
start-machine-learning	A complete guide to start and improve in machine learning (ML), artificial intelligence (AI) in 2021 without ANY background in the field and stay up-to-date with the latest news and state-of-the-art techniques
[Data Science Specialization	John Hopkins Coursera](https://github.com/mGalarnyk/datasciencecoursera)

Books

Python Data Science Handbook
Hands-On Machine Learning with Scikit-Learn and TensorFlow
- Machine Learning Notebooks
  - A series of Jupyter notebooks that walk you through the fundamentals of Machine Learning and Deep Learning in python using Scikit-Learn and TensorFlow.

GitHub Repositories :octocat:

Title	Description
Awesome Artificial Intelligence (AI)	A curated list of Artificial Intelligence (AI) courses, books, video lectures and papers.
ml-surveys	Survey papers summarizing advances in deep learning, NLP, CV, graphs, reinforcement learning, recommendations, graphs, etc.
awesome-analytics-engineering	Awesome list of resources for analytics engineers.

Tools

Title	Description
Weight Watcher	WeightWatcher (WW): is an open-source, diagnostic tool for analyzing Deep Neural Networks (DNN), without needing access to training or even test data.

Papers

Title	Description, Information
2021: A Year Full of Amazing AI papers- A Review / 📌 [work in progress...]	A curated list of the latest breakthroughs in AI by release date with a clear video explanation, link to a more in-depth article, and code. [work in progress]

Certifications

[ ] TensorFlow Developer Certificate
[ ] Certified Analytics Professional (CAP)
[ ] Cloudera Certified Associate: Data Analyst
[ ] Cloudera Certified Professional: CCP Data Engineer
[ ] Data Science Council of America (DASCA) Senior Data Scientist (SDS)
[ ] Data Science Council of America (DASCA) Principal Data Scientist (PDS)
[ ] Dell EMC Data Science Track
[ ] Google Certified Professional Data Engineer
[ ] Google Data and Machine Learning
[ ] IBM Data Science Professional Certificate
[ ] Microsoft MCSE: Data Management and Analytics
[ ] Microsoft Certified Azure Data Scientist Associate
[ ] Open Certified Data Scientist
[ ] SAS Certified Advanced Analytics Professional
[ ] SAS Certified Big Data Professional
[ ] SAS Certified Data Scientist

Online Conferences, Meetups, Data Summer Schools

Live Webinars & On-demand Recordings by ODSC COMMUNITY
Data Science fwdays'19 (playlist)
Webinars 2020, Computer Science UCU
Eastern European Machine Learning Summer School, 2020 (Deep Learning and Reinforcement Learning
- Program
- Practical Sessions 2020, GitHub Repository

Twitter

Podcasts

Blogs

Companies Blogs

- :octocat: Software Engineering Blogs
- A curated list of engineering blogs
Amazon | Science
AWS Machine Learning Blog
The Netflix Tech Blog
Uber Engineering
NVIDIA Developer

Other Blogs

Towards AI
- Tutorials
  - AI-related tutorials.
Data Notes
Louis Bouchard | @What's AI - Making AI Accessible
Michael Galarnyk
Data Science Dojo

Articles

Communities

Title	Description
Coursera Comminity Data Science
Locally Optimistic	A community for current and aspiring data analytics leaders. Started in NYC in early 2018 as an outgrowth of a slack channel / extremely informal meetup group, we hope to share our thoughts / opinions / experiences / trials / tribulations with others in the community.
Deepchecks Community	A place to talk about MLOps news, articles, conferences, and really just anything in the MLOps space.

Telegram Chanels

DataScience Digest
- Collection of the top articles, videos, events, books and jobs on Machine Learning, Deep Learning, NLP, Computer Vision and other aspects of Data Science.

Main skills required by the data scientists vacancies

The research made by Faculty of Applied Sciences at UCU. Link on main article.

Big Data Software Engineer / Data Engineer

Linear algebra. Calculus. Statistics and Probability Theory.
Machine Learning Algorithms: regression, simulation, scenario analysis, modeling, clustering, decision trees, etc.
Python 3, Pandas, Scikit Learn, Keras, Tensor Flow, Numpy, PyTorch.
Data visualization.
Software engineering methodologies, functional programming or object-oriented programming.
DevOps: containerization and orchestration.
Classic DBs (relational or object): MySQL, PostgreSQL, RDS.
NoSQL (documented): MongoDB, Cassandra, HBase, Elasticsearch, Redis, DynamoDB.
NewSQL (hybrid/in memory): Memsql, VoltDB.
Query engines: Impala, Presto.
Cloud platforms (GCP, AWS). Cloud computation (Dataflow, Dataproc). Streaming (Pub/Sub, Kafka). Data storage (BigQuery, Cloud SQL, Cloud Spanner, Firestore, BigTable).
ETL Concepts / Processes.
Data Warehouse technologies, Data Lake architecture.
Data modeling: Bachman diagrams, Chen’s Notation, Object-relational mapping, etc.
Processing frameworks: Apache Spark (Pyspark/SparkR/sparklyr), Flink, Beam, Kafka streams
Data pipeline and workflow management tools: Azkaban, Luigi, Airflow, etc.

Data Scientist

Python (PyCharm, Pandas, NumPy, bs4, sklearn, scipy). R.
Linear algebra. Calculus. Statistics.
Machine Learning techniques (Decision Trees, Random Forest, SVM, Bayesian, XG Boost, K-Nearest Neighbors) and concepts: regression and classification, clustering, feature selection, feature engineering, the curse of dimensionality, bias-variance tradeoff, SVMs.
Data visualization.
Data Mining (Clustering, Frequent Pattern Mining, Outliers Detection).
Neural Networks and ML Packages (sklearn/sqboost/Tensorflow/Keras, H20).
Cloud platforms (GCP, AWS). Cloud computation (Dataflow, Dataproc). Streaming (Pub/Sub, Kafka). Data storage (BigQuery, Cloud SQL, Cloud Spanner, Firestore, BigTable).
Databases: SQL and non-SQL, AWS cloud storage, GDPR data privacy.
Processing frameworks: Hadoop, Spark.
Business Intelligence Software (Power BI, Tableau, Qlik, Cognos Analytics).

Machine Learning Engineer

Computer science fundamentals, algorithms, mathematics, linear algebra, probability, and statistics.
Python (Pandas, Numpy, Scikit-Learn, Tensorflow, Keras).
Python visualization tools: matplotlib/seaborn, Plotly.
Machine Learning techniques (Decision Trees, Random Forest, SVM, Bayesian, XG Boost, K-Nearest Neighbors) and concepts: regression and classification, clustering, feature selection, feature engineering, the curse of dimensionality, bias-variance tradeoff, SVMs.
Deep Learning: Recurrent Neural Network (LSTM/GRU units), Convolutional Neural Network.
Machine learning frameworks (TensorFlow, Caffe2, PyTorch, Spark ML, scikit-learn) and ML techniques: GAN, ASR, RL.
Databases: SQL and non-SQL. Hadoop ecosystem.
Processing frameworks: Apache Spark (Pyspark/SparkR/sparklyr)
Cloud platforms (GCP, AWS).

Data Analyst

Math, Statistics (regression, properties of distributions, statistical tests, and proper usage, etc.) and Probability Theory.
Statistical programming software (R, Python, SAS, Matlab).
Predictive analytics (regression models, time-series analysis and forecasting, survival or duration analysis).
BI tools: Google Data Studio / Microsoft PowerBI / Tableau.
Classic DBs: MySQL.
MS Excel.
A/B testing.

NLP Engineer / NLP Data Scientist

Python (sklearn, nltk, gensim, spacy, Tensor Flow, PyTorch, Keras) and Python Data Science toolkit: Jupyter Notebook, Pandas, Numpy, Matplotlib/Seaborn, Scipy.
Databases: SQL and NoSQL (MySQL, MongoDB, PostgreSQL ) .
NLP libraries: NLTK, SpaCy, Stanford CoreNLP etc.
NLP techniques for text representation: (TF-IDF, Word2Vec), semantic extraction, data structures and modeling.
Methods of Information Extraction (NER, terminology extraction, keywords extraction, etc.)
Machine Learning techniques and concepts (regression, trees, SVM, ensembles) for NLP tasks.

CV Engineer

Linear Algebra. Geometry. Calculus. Statistics and Probability theory.
Python3, numpy, pandas, seaborn, scipy.
Computer vision / image processing libraries such as: OpenCV, Pillow.
Convolutional Neural Networks (LSTM, inception, residual, GAN).
Neural network frameworks: TensorFlow, PyTorch.
Computer vision algorithms and architectures: object detection, segmentation, face recognition, image processing, video processing.
Real-time CV systems based on Deep Learning.
Cloud model training (GCP, AWS), Cloud integration, Cloud Platforms.
Performance metrics in object detection and classification, such as mAP and related.
Big Data (Hadoop, Spark, Hive).

Deep Learning Engineer / Deep Learning Research Engineer

Python3: numpy, scikit-learn, pandas, scipy.
Statistics (regression, properties of distributions, statistical tests, and proper usage, etc.) and probability theory.
Deep learning frameworks: Tensorflow, PyTorch; MxNet, Caffe, Keras.
Deep learning architectures: VGG, ResNet, Inception, MobileNet.
Deepnets, hyperparameter optimization, visualization, interpretation.
Machine learning models.

The Data Science Interview Preparation

Typical interview construction

Software Engineering (for more visit Interview Preparation Repository)
Applied Statistics
Machine Learning
Data Wrangling, Manipulation and Visualisation

2. Applied Statistics

Descriptive statistics (What distribution does my data follow, what are the modes of the distribution, the expectation, the variance)
Probability theory (Given my data follows a Binomial distribution, what is the probability of observing 5 paying customers in 10 click-through events)
Hypothesis testing (forming the basis of any question on A/B testing, T-tests, anova, chi-squared tests, etc).
Regression (Is the relationship between my variables linear, what are potential sources of bias, what are the assumptions behind the ordinary least squares solution)
Bayesian Inference (What are some advantages/disadvantages vs frequentist methods)

Introduction to Probability and Statistics, an open course on everything listed above including questions and an exam to help you test your knowledge.
Machine Learning: A Bayesian and Optimization Perspective by Sergios Theodoridis. This is more a machine learning text than a specific primer on applied statistics, but the linear algebra approaches outlined here really help drive home the key statistical concepts on regression.

Sample Quant Exam

Data-Science
Data-Science copied to clipboard

Metadata

🔄 Constantly updated. Subscribe not to miss anything.

Data Science Tasks

Educational Platforms

University courses 👩‍🎓

Julia language

Time Series

Online courses

GitHub Repositories :octocat:

Books

GitHub Repositories :octocat:

Tools

Papers

Certifications

Online Conferences, Meetups, Data Summer Schools

Twitter

Podcasts

Blogs

Companies Blogs

Other Blogs

Articles

Communities

Telegram Chanels

Main skills required by the data scientists vacancies

Big Data Software Engineer / Data Engineer

Data Scientist

Machine Learning Engineer

Data Analyst

NLP Engineer / NLP Data Scientist

CV Engineer

Deep Learning Engineer / Deep Learning Research Engineer

The Data Science Interview Preparation

Typical interview construction

2. Applied Statistics

← Metadata

Owner

Metadata

Data-Science Data-Science copied to clipboard

Metadata

🔄 Constantly updated. Subscribe not to miss anything.

Data Science Tasks

Educational Platforms

University courses 👩‍🎓

Julia language

Time Series

Online courses

GitHub Repositories :octocat:

Books

GitHub Repositories :octocat:

Tools

Papers

Certifications

Online Conferences, Meetups, Data Summer Schools

Twitter

Podcasts

Blogs

Companies Blogs

Other Blogs

Articles

Communities

Telegram Chanels

Main skills required by the data scientists vacancies

Big Data Software Engineer / Data Engineer

Data Scientist

Machine Learning Engineer

Data Analyst

NLP Engineer / NLP Data Scientist

CV Engineer

Deep Learning Engineer / Deep Learning Research Engineer

The Data Science Interview Preparation

Typical interview construction

2. Applied Statistics

← Metadata

Owner

Metadata

Data-Science
Data-Science copied to clipboard