AutoML_Alex
AutoML_Alex copied to clipboard
State-of-the art Automated Machine Learning python library for Tabular Data
AutoML Alex
State-of-the art Automated Machine Learning python library for Tabular Data
Works with Tasks:
-
[x] Binary Classification
-
[x] Regression
-
[ ] Multiclass Classification (in progress...)
Benchmark Results
data:image/s3,"s3://crabby-images/0a434/0a434027bfc5c4d6211cf322323d5ad94ec0cd8d" alt="bench"
The bigger, the better
From AutoML-Benchmark
Scheme
data:image/s3,"s3://crabby-images/9c2b2/9c2b25378b59bcae6b0bd8246ccdf6fe89680798" alt="scheme"
Features
- Automated Data Clean (Auto Clean)
- Automated Feature Engineering (Auto FE)
- Smart Hyperparameter Optimization (HPO)
- Feature Generation
- Feature Selection
- Models Selection
- Cross Validation
- Optimization Timelimit and EarlyStoping
- Save and Load (Predict new data)
Installation
pip install automl-alex
Docs
🚀 Examples
Classifier:
from automl_alex import AutoMLClassifier
model = AutoMLClassifier()
model.fit(X_train, y_train, timeout=600)
predicts = model.predict(X_test)
Regression:
from automl_alex import AutoMLRegressor
model = AutoMLRegressor()
model.fit(X_train, y_train, timeout=600)
predicts = model.predict(X_test)
DataPrepare:
from automl_alex import DataPrepare
de = DataPrepare()
X_train = de.fit_transform(X_train)
X_test = de.transform(X_test)
Simple Models Wrapper:
from automl_alex import LightGBMClassifier
model = LightGBMClassifier()
model.fit(X_train, y_train)
predicts = model.predict_proba(X_test)
model.opt(X_train, y_train,
timeout=600, # optimization time in seconds,
)
predicts = model.predict_proba(X_test)
More examples in the folder ./examples:
-
01_Quick_Start.ipynb
-
02_Data_Cleaning_and_Encoding_(DataPrepare).ipynb
-
03_Models.ipynb
-
04_ModelsReview.ipynb
-
05_BestSingleModel.ipynb
- Production Docker template
What's inside
It integrates many popular frameworks:
- scikit-learn
- XGBoost
- LightGBM
- CatBoost
- Optuna
- ...
Works with Features
-
[x] Categorical Features
-
[x] Numerical Features
-
[x] Binary Features
-
[ ] Text
-
[ ] Datetime
-
[ ] Timeseries
-
[ ] Image
Note
- With a large dataset, a lot of memory is required! Library creates many new features. If you have a large dataset with a large number of features (more than 100), you may need a lot of memory.
Realtime Dashboard
Works with optuna-dashboard
data:image/s3,"s3://crabby-images/be33c/be33c23b9d8cf73364b47ef0c0c17c02c6a32b1a" alt="Dashboard"
Run
$ optuna-dashboard sqlite:///db.sqlite3
Road Map
-
[x] Feature Generation
-
[x] Save/Load and Predict on New Samples
-
[x] Advanced Logging
-
[x] Add opt Pruners
-
[x] Docs Site
-
[ ] DL Encoders
-
[ ] Add More libs (NNs)
-
[ ] Multiclass Classification
-
[ ] Build pipelines