Data-Science-in-Python
Data-Science-in-Python copied to clipboard
Resources to help you get started with Data Science
Data-Science-in-Python
Table of contents
-
Installation
-
Books
-
Online courses
-
Youtube channels & Videos
-
Presentations
-
Data Science using Python
-
Competitions
-
Data Science Ideas
-
Data Sets
Installation
or
- Winpy (alternative)
What is Data Science?
- What is Data Science @ O'reilly
- What is Data Science @ Quora
- The sexiest job of 21st century
- What is data science
- What is a data scientist
- Wikipedia
- a very short history of #datascience
- An Introduction to Data Science, PDF.
- Data Science Methodology by John Rollins PhD
- A Day in the Life of a Data Scientist by Rutgers University
Books
- Python Data Science Handbook
- The Data Science Handbook
- The Art of Data Usability - Early access
- Think Like a Data Scientist
- R in Action, Second Edition
- Introducing Data Science
- Practical Data Science with R
- Exploring Data Science - free eBook sampler
- Exploring the Data Jungle - free eBook sampler
Online courses
- Applied Data Science with Python Specialization
- Microsoft Professional Program in Data Science
- Intro to Data Science
- Python Data camp
- Introduction to Python for Data Science
- Intro to Data Science by Microsoft
Youtube Videos & Channels
- What is machine learning?
- Andrew Ng: Deep Learning, Self-Taught Learning and Unsupervised Feature Learning
- Deep Learning: Intelligence from Big Data
- Interview with Google's AI and Deep Learning 'Godfather' Geoffrey Hinton
- Introduction to Deep Learning with Python
- What is machine learning, and how does it work?
- Data School - Data Science Education
- Neural Nets for Newbies by Melanie Warrick (May 2015)
- Neural Networks video series by Hugo Larochelle
- Google DeepMind co-founder Shane Legg - Machine Super Intelligence
Presentations
- How to Become a Data Scientist
- Introduction to Data Science
- Intro to Data Science for Enterprise Big Data
- How to Interview a Data Scientist
- How to Share Data with a Statistician
- The Science of a Great Career in Data Science
- What Does a Data Scientist Do?
- Building Data Start-Ups: Fast, Big, and Focused
- How to win data science competitions with Deep Learning
Data Science using Python
This list covers only Python, as many are already familiar with this language. Data Science tutorials using R.
Learning Python
numpy
numpy is a Python library which provides large multidimensional arrays and fast mathematical operations on them.
pandas
pandas provides efficient data structures and analysis tools for Python. It is build on top of numpy.
- Introduction to pandas
- DataCamp pandas foundations - Paid course, but 30 free days upon account creation (enough to complete course).
- Pandas cheatsheet - Quick overview over the most important functions.
scikit-learn
scikit-learn is the most common library for Machine Learning and Data Science in Python.
- Introduction and first model application
- Rough guide for choosing estimators
- Scikit-learn complete user guide
- Model ensemble: Implementation in Python
Jupyter Notebook
Jupyter Notebook is a web application for easy data visualisation and code presentation.
- Downloading and running first Jupyter notebook
- Example notebook for data exploration
- Seaborn data visualization tutorial - Plot library that works great with Jupyter.
Common Algorithms and Procedures
- Supervised vs unsupervised learning - The two most common types of Machine Learning algorithms.
- 9 important Data Science algorithms and their implementation
- Cross validation - Evaluate the performance of your algorithm / model.
- Feature engineering - Modifying the data to better model predictions.
- Scientific introduction to 10 important Data Science algorithms
- Model ensemble: Explanation - Combine multiple models into one for better performance.
Competitions
Some data mining competition platforms
Data Science Ideas
Human Resources
- Competency forecasting
- Employee churn analytics
- Employee performance analytics
- Network analytics on employee interactions
- Resume matching, preselection and tagging
- Workforce planning
Finance
- Cost analytics
- Fraud detection
- Waste and abuse detection
IT
- Component quality analytics
- Cybercrime detection
- Server performance monitoring and alerting
- Incident management tickets automatic routing and reply or clustering
Marketing
- Churn/Customer attrition
- Customer segmentation
- Life Time Value
- Personalized advertising
- Product recommendation engines using recommendation engines
- Marketing Optimization
- Social Media Analytics
- Text Analytics on customer complaints
Sales
- Cross-sell opportunities using propensity models
- Lead scoring
- Price elasticity
- Revenue forecasting or Kaggle
Supply chain
- Demand forecasting
- Gas purchase optimization
- Inventory forecasting
- Optimal routes
- Warehouse location optimization
Insurance
- Fraud detection
- Litigation prediction
- Pricing using telematics
- Solvency II and ORSA compliance
- Risk analytics
Life sciences
- Design of experiments
- R&D portfolio optimization
Manufacturing
Finance and Tax
Public Safety
- Crime Wave Detection
- Patrolling Suggestions (Preventative Policing)
- Crime Case Resolution Prediction
- Crime Clustering
- Complex/Organised Crime network detection
- Terrorist Cell Identification
- Alerting & Officer Safety
- Criminal Evolution
- Domestic Violence
- Radicalisation prediction
- Mass scale surveillance
Data Sets
- Academic Torrents
- hadoopilluminated.com
- data.gov - The home of the U.S. Government's open data
- United States Census Bureau
- usgovxml.com
- enigma.com - Navigate the world of public data - Quickly search and analyze billions of public records published by governments, companies and organizations.
- datahub.io
- aws.amazon.com/datasets
- databib.org
- datacite.org
- quandl.com - Get the data you need in the form you want; instant download, API or direct to your app.
- figshare.com
- GeoLite Legacy Downloadable Databases
- Quora's Big Datasets Answer
- Public Big Data Sets
- Houston Data Portal
- Kaggle Data Sources
- Kaggle Datasets
- A Deep Catalog of Human Genetic Variation
- A community-curated database of well-known people, places, and things
- Google Public Data
- World Bank Data
- NYC Taxi data
- Open Data Philly Connecting people with data for Philadelphia
- A list of useful sources A blog post includes many data set databases
- grouplens.org Sample movie (with ratings), book and wiki datasets
- UC Irvine Machine Learning Repository - contains data sets good for machine learning
- research-quality data sets by Hilary Mason
- National Climatic Data Center - NOAA
- ClimateData.us (related: U.S. Climate Resilience Toolkit)
- r/datasets
- MapLight - provides a variety of data free of charge for uses that are freely available to the general public. Click on a data set below to learn more
- GHDx - Institute for Health Metrics and Evaluation - a catalog of health and demographic datasets from around the world and including IHME results
- St. Louis Federal Reserve Economic Data - FRED
- New Zealand Institute of Economic Research – Data1850
- Dept. of Politics @ New York University
- Open Data Sources
- UNICEF Statistics and Monitoring
- UNICEF Data
- undata
- NASA SocioEconomic Data and Applications Center - SEDAC
- The GDELT Project
- Sweden, Statistics
- Github free data source list
- StackExchange Data Explorer - an open source tool for running arbitrary queries against public data from the Stack Exchange network.
- San Fransisco Government Open Data
- IBM Blog abour open data
- Open data Index
- Liver Tumor Segmentation Challenge Dataset
Reference https://github.com/bulutyazilim/awesome-datascience https://github.com/JosPolfliet/awesome-datascience-ideas https://github.com/siboehm/awesome-learn-datascience