decision-forests
decision-forests copied to clipboard
Auto-train, auto-tune & auto-serve the best TF-DF model directly from CSV files
As my GSoC contributions are almost over, as a part of my additional work, I'm working on developing a layer above TF-DF that:
- auto train TF-DF directly from CSV files
- auto tune TF-DF using Keras Tuner
- auto serve best TF-DF model using FastAPI
This will help in making TF-DF a favorite choice for dealing with tabular data in the Kaggle community where most training data are in CSV format.
That sounds awesome @rishiraj !! This tutorial may be a starting point.
Some optional suggestions/ideas:
- For small datasets, add support for n-fold cross-validation. This will make the best use of the data available. Also for small datasets it's so fast to train, that it doesn't cost much.
- For large datasets, two suggestions, we often make here:
- Do most of the hyperparameter tuning on a smaller (sub-sampled) dataset for speed. It may not be optimal, but often due to resources constraints it's more feasible. Once the best parameters are found, train on the whole data.
- Parallelize the hyperparameter tuning in various machines -- requires knowhow of whatever cloud solution one uses to parallelize. The complication here is not ML, but dealing with starting jobs and collecing results.
cheers!