Machine-Learning-Data-Science-Reuse
Machine-Learning-Data-Science-Reuse copied to clipboard
Gathers machine learning and data science techniques for problem solving.
Machine-Learning-Data-Science-Reuse
Gathers machine learning and data science techniques for problem solving.

Warning
THIS REPOSITORY WILL LACK OF COMMENT, LACK OF DOCUMENTATION AND LACK OF STORY TELLING. PURPOSELY FOR SELF-REUSE.
Most of visualizations are self-explained, and at-least required basic understanding in statistics and python.
Some of visualization will not able to visualize because Github not able to render specific libraries that are using svg based, so please run it on any machine to see the results.
Why Genie? Because he can solved anything!
Table of contents
- R vs Python
- Preprocessing
- Natural Language Processing
- Suggestion Engine
- Image processing
- Signal processing
- Stacking
- Stochastic study
- Big-query
- Network study
- Visualization
- Markov
- English-text normalization
R vs Python
- CSV, Data Manipulation, Visualization
Preprocessing
- Handle missing values
- Rescaling (log, vector normalization, standardization, min-max scaling, boxcox)
- Features understanding
- Detecting outliers
- Encoding type comparison
Natural Language Processing
- Bag Of Word
- TF-IDF
- Hashing algorithm
- Models gathering (Bayes, SVM, XGB, LightGBM)
- sklearn pipeline
- N-gram
- Topic Modelling
- Naive-Bayes-SVM on hate speech
- Black panther visualization using wordclouds, semantic and kmean similarity network
- Semantic similarity on Malaysia hot topics
Suggestion Engine using Nearest-Euclidean and Gaussian Distribution
- Anime
- Game
- Movie
- Kickstarter projects
Image processing
- Augmentation (flip, rotate, shifting, zoom, shear, channel shift, grayscale, contrast, saturation)
- RGB subdivide
- hog-featuring
- image segmentation, nucleus
- K Nearest Neighbors on PCA / NMF
- SVD study on nearest neighbors
- Image wrapping to full A4
Signal processing
- Blurring on 1D Signal (loop, and FFT)
- Blurring on 2D Signal (loop)
- Conv 2 signals
- Pass-filter for freqs
- Signal smoothing
- Signal cross-correlation
- Augmentation (pitching, speed, distribution noise, shifting, silent shifting)
- Featuring (mfcc, log-energy, feature cube, power spectrum)
Stacking
- binary
- regression
- multi-classes
- stack multiple models from sklearn regressor with XGB
Stochastic study
- Cryptocurrencies correlation
- Predict crpytocurrencies multiple stack
- Simple stock analysis
- ARIMA for flight prediction
- TESLA market study
Big-query
- integrate big-query with Pandas Python
- Medicare queries with plotly visualization
Network study
- graph nodes for a person most spoke to whom
- Spooky social network analysis
- Taxi nodes analysis
- Stackoverflow tags analysis
- donald trump news social network
- najib razak twitter social network
Visualization
- Geographic using basemap
- Folium map and time analysis
- Israel graph visualization
- Israel political landscape
- Distribution age vs type for library
- Growth study for library
- botnet attack analysis
- Plotly geo-mapping 101
- Plotly bombing mapping visualization
- Easy plotly using cufflink
- Plotly pokemon data
- Rare visualization
- Dynamic map visualization using plotly and folium
- Kaggle 2018 Report
Markov
- Independent variables on weather forecast
- Dependent variables on text dataset
- Shakespeare character-wise generator
English-text normalization
- normalized texts (Dates, Measure, Decimals, Cardinals, Electronic - URL, Currency - Dollars, Telephone Numbers)
- normalized texts (Cardinal, Digit, Ordinal, Letters, Address, Telephone, Electronic, Fractions, Money)