feature-extraction
feature-extraction copied to clipboard
Sample techniques for a variety of feature extraction methods
Practical Feature Extraction
This repository contains a compendium of useful feature extraction techniques I have learned about over the years. If you have a favorite that I have missed, let me know.
Techniques covered (aspirationally)
Categorical
One-hot encoding
Hashed one-hot encoding
Unique ID
Binary encoding after sorting
Count encoding
Rank encoding
Rank-change
Naive Bayes Rate Encoding
Semantic embedding
tf.idf
Luduan terms *
Numerical
Binning *
Rounding
Log
Temporal
Day of week, Hour of day, Weekend/holiday indicators
Quadrature encodings
Distance to event
Lagged features
Geographical
Pre-clustering
S2 Geo Points
Proximity to cities
MSA
Zip3
Word-like and Text
tf.idf
Luduan terms
Semantic embeddings
Glove https://nlp.stanford.edu/projects/glove/?source=post_page
Indicator detection
IP Address
Reverse resolution
CIDR
CIDR prefix
Missing Data
As a special value (unknown word)
Means
Reverse model
Consolidation
Unknown word
Stemming
Parsing and Modeling
User agent
IP domains
Email address
Headers
Referrer
5P energy models
Scaling
Q scaling
Z scaling
Min-max scaling
Log
Cross modeling
Other models
Modeled structure
Word2vec