Add mlforgex
What is mlforgex?
mlforgex is an end-to-end machine learning automation package for Python that eliminates manual setup and complexity. It automates the entire ML workflow—from data preprocessing to model deployment—enabling you to train, evaluate, and make predictions with minimal effort.
Main Features
- Automatic Data Preprocessing: Handles missing values, outliers, duplicate removal, categorical encoding, scaling, and multicollinearity (VIF) detection automatically.
- Automatic Problem Detection: Intelligently detects whether your task is classification (binary/multiclass), regression, or NLP—no manual configuration needed.
- NLP Pipeline: Full text preprocessing pipeline with tokenization, stopword removal, lemmatization, Word2Vec vectorization, and saved artifacts for reproducible inference.
- Imbalanced Data Handling: Auto-detects class imbalance and applies SMOTE or under-sampling inside cross-validation folds to prevent data leakage.
- Model Training & Selection: Trains a curated pool of candidate models and automatically selects the best performer using composite scoring (customizable F1/RMSE weights).
- Hyperparameter Tuning: RandomizedSearchCV with configurable iterations and cross-validation folds; optional fast mode skips tuning for rapid prototyping.
- Interactive Dashboard: Single HTML dashboard (
Dashboard.html) aggregates all Plotly visualizations, metrics, model comparison table, and run configuration. - Reproducible Artifacts: Saves model, preprocessing pipeline, encoders, Word2Vec models, and metrics for production deployment and full reproducibility.
- Visualizations: Correlation heatmap, confusion matrix, ROC/Precision-Recall curves, learning curves, residual plots, feature importance, and word clouds.
- CLI & Python API: Dual interface—use command-line for quick training or Python API for programmatic control.
What's the difference between mlforgex and similar AutoML tools?
Unlike generic AutoML packages, mlforgex offers:
1. Unified NLP Support
Most AutoML tools lack native NLP pipelines. mlforgex includes Word2Vec vectorization, text preprocessing, and saved models ready for production inference.
2. Single Interactive Dashboard
Instead of scattered plot files, mlforgex generates one polished Dashboard.html with responsive Plotly charts, model comparison table, and run metadata—no manual aggregation needed.
3. Smart Problem Auto-Detection
Automatically distinguishes classification vs regression without user hints. Detects binary vs multiclass and adjusts metrics accordingly.
4. Leak-Free Resampling
Cross-validation resampling (SMOTE) occurs inside training folds only, preventing data leakage—a common mistake in AutoML pipelines.
5. Customizable Composite Scoring
Rank models by weighted combinations of metrics (f1_prob, rmse_prob), not just a single metric. Fine-tune model selection to your use case.
6. Fast Mode
Optional --fast flag skips hyperparameter tuning and uses robust defaults—perfect for rapid iteration when compute is limited.
7. Complete Artifact Reproducibility
Saves preprocessing pipeline, encoders, Word2Vec models, and metrics so predictions on new data use the exact same pipeline as training—true reproducibility.
8. Minimal Configuration, Maximum Control
Sensible defaults work out-of-the-box (one command trains a full pipeline), but advanced users can tweak preprocessing, tuning iterations, cross-validation folds, and NLP settings via flags.
9. No Hidden Magic
Clear, documented preprocessing steps and model selection logic—you know exactly what the package is doing at each stage.
10. Production-Ready Output
Generates serialized models, metrics reports, and dashboards immediately deployable to production environments or data science presentations.