ludwig
ludwig copied to clipboard
Any reason not to have `automl` subcommand supported?
Is your feature request related to a problem? Please describe.
I only find init_config
subcommand which is equivalent to ludwig.automl.create_auto_config
, but I'd like to have a command to kick off automl job directly like automl
which should be equivalent to ludwig.automl.auto_train
root@22b9afe42cc3:/data# ludwig --help
NumExpr defaulting to 4 threads.
usage: ludwig <command> [<args>]
Available sub-commands:
train Trains a model
predict Predicts using a pretrained model
evaluate Evaluate a pretrained model's performance
experiment Runs a full experiment training a model and evaluating it
hyperopt Perform hyperparameter optimization
serve Serves a pretrained model
visualize Visualizes experimental results
collect_summary Prints names of weights and layers activations to use with other collect commands
collect_weights Collects tensors containing a pretrained model weights
collect_activations Collects tensors for each datapoint using a pretrained model
datasets Downloads and lists Ludwig-ready datasets
export_torchscript Exports Ludwig models to Torchscript
export_triton Exports Ludwig models to Triton
export_neuropod Exports Ludwig models to Neuropod
export_mlflow Exports Ludwig models to MLflow
preprocess Preprocess data and saves it into HDF5 and JSON format
synthesize_dataset Creates synthetic data for testing purposes
init_config Initialize a user config from a dataset and targets
render_config Renders the fully populated config with all defaults set
ludwig cli runner
positional arguments:
command Subcommand to run
optional arguments:
-h, --help show this help message and exit
root@22b9afe42cc3:/data#
Describe the use case As a user, I want to have automl supported natively by CLI in order to quickly trigger a job. Right now, I have to load the dataset and write simple program to start the job like below.
import logging
import pprint
from load_util import load_mushroom_edibility
from ludwig.automl import auto_train
mushroom_edibility_df = load_mushroom_edibility()
auto_train_results = auto_train(
dataset=mushroom_edibility_df,
target='class',
time_limit_s=7200,
tune_for_memory=False
)
pprint.pprint(auto_train_results)
Describe the solution you'd like
ludwig automl --dataset xxx.csv --target "class" --time_limit_s=7200 --hyperopt=true --tune_for_memory=True
Describe alternatives you've considered N/A
Additional context N/A
Thanks for raising this issue @Jeffwan. This should be relatively quick to implement, so we'll see if we can get it added for v0.6.
I think this is a great idea -- similarly, I filed #1934 requesting the same. Including this in 0.6 SGTM.