data-science
data-science copied to clipboard
Notebooks and Python about data science
Learning data science step by step
Most of the examples presented in Internet tutorials are either using powerful libraries (Scikit Learn, Keras...), complex models (neural nets), or based on data samples with many features.
In this collection of workbooks, I want to start from simple examples and raw Python code and then progressively complexify the data sets and use more complex technics and libraries.
On purpose, most datasets are generated in order to adjust the parameters fitting with the demonstration.
The notebooks are of type Jupyter, using Python 3.7
To read or edit the notebooks you may :
- Browse notebooks in HTML from the HTML table of content
- Open this repository in nbviewer
- Clone the repository in order to test and modify locally within Jupyter ou JupyterLab

Linear regression
Let's progressively start from simple univariate example and then add progressively more complexity:
- Univariate function approximation with linear regression,
- Closed form, with Numpy, Scipy or SciKit Learn, eventually with gradient descent and stochastic gradient descent (HTML / Notebook)
- Using Tensor Flow (HTML / Jupyter)
- Bivariate function approximation with linear regression,
- Closed form, using SciKit Learn, (stochastic) gradient descent, adding regularizer (HTML / Jupyter)
- Using Keras, single perceptron linear regression, two layer model (HTML / Jupyter)
- Model confidence and quality evaluation in the Gaussian model case (HTML / Jupyter)
- Feature engineering or feature learning with linear regression (HTML / Jupyter)
Classification
Binary classification with parametric models
- Univariate function as boundary on a two classes data, approximated with logistic regression,
- Homemade, using SciKit Learn (HTML / Jupyter)
- Bivariate parametric function as a boundary, approximated with logistic regression,
- Homemade, using SciKit Learn (HTML / Jupyter)
- Using Tensor flow (HTML / Jupyter)
- Using Keras, adding regularizers and eventually a two layer neural net (HTML / Jupyter)
Binary classification with non-parametric models
- Bivariate with K Nearest Neighbors (KNN), homemade, using SciKit Learn (HTML / Jupyter)
- Non linear problem solving with Support Vector Machine (SVM) (HTML / Jupyter)
Multi-class classification with regression or neural networks
- Two features to separate the 2D plan into 3 or more categories
- Using Keras matching on linearly separable problem (Czech flag) and not linearly separable problem (Norway flag), using 2 and 3 layer neural net to handle the second problem (HTML / Jupyter)
Multi-class classification with non-parametric models
- Multi-class classification using decision trees (HTML / Jupyter)
Deep learning
Convolutional neural networks (CNN)
- Introduction to CNN as an image filter
- Part 1 - Horizontal edge detector using a simple 1-2 layer neural nets (HTML / Jupyter)
- coming soon Part 2 - Combined horizontal-vertical edge detector using multiple convolutionnal units
- CNN versus Dense comparison on MNIST
- Part 1 - Design and performance comparison (HTML / Jupyter)
- Part 2 - Visualization with UMAP (HTML / Jupyter)
- coming soon Part 3 - Resilience to geometric transformations
- Interpretability
- Activation maps on CIFAR-10 (HTML / Jupyter)
- Saliency maps on CIFAR-10 (HTML / Jupyter)
- Saliency maps on Imagenet (subset) with ResNet50 (HTML / Jupyter) (WORK ON GOING)
- CNN as a graph using NetworkX, extract centrality values (HTML / Jupyter) (WORK ON GOING)
- Other CNNs
- Fashion MNIST CNN with Data Augmentation (HTML / Jupyter)
Generative networks (VAE, GAN)
- Generative Adversarial Networkds (GAN), the basics on MNIST, with Tensorflow 2 / Keras and Tensorflow Datasets
- Original GAN using Dense layers (HTML / Jupyter)
- GAN with convolutions (DCGAN) (HTML / Jupyter)
- GAN with convolutions (DCGAN), no Dense layer on the generator path (HTML / Jupyter)
- GAN and Bayesian network on ski outing reports and prediction of global warming impact on skiing in the Alps (HTML / Jupyter)
Natural Language Processing (NLP)
- Classification of mountaineering routes based on the textual description with fastText and Tensorflow (HTML / Jupyter)
- Summarized in Medium article "Full NLP use case with fastText and Tensorflow 2"
- Data preparation (HTML / Jupyter)
Reading list
Books
- Deep Learning - I. Goodfellow, Y. Bengio, A. Courville, The MIT Press.
- Very good overview of machine learning and its extension to deep learning
- An Introduction to Statistical Learning with Applications in R - G. James, D. Witten, T. Hastie, R. Tibshirani.
- Traditional machine learning including regressions, clustering, SVM...
Nice notebooks
Tutorials and courses
Papers
- You Look Only Once: Unified, Real-time object detection
- Learning to forget, continual prediction with LSTM - F. A. Gers et al.
- What are biases in my word embeddings ? - N. Swinger et al.