ludwig icon indicating copy to clipboard operation
ludwig copied to clipboard

Any reason not to have `automl` subcommand supported?

Open Jeffwan opened this issue 2 years ago • 2 comments

Is your feature request related to a problem? Please describe. I only find init_config subcommand which is equivalent to ludwig.automl.create_auto_config, but I'd like to have a command to kick off automl job directly like automl which should be equivalent to ludwig.automl.auto_train

root@22b9afe42cc3:/data# ludwig --help
NumExpr defaulting to 4 threads.
usage: ludwig <command> [<args>]

Available sub-commands:
   train                 Trains a model
   predict               Predicts using a pretrained model
   evaluate              Evaluate a pretrained model's performance
   experiment            Runs a full experiment training a model and evaluating it
   hyperopt              Perform hyperparameter optimization
   serve                 Serves a pretrained model
   visualize             Visualizes experimental results
   collect_summary       Prints names of weights and layers activations to use with other collect commands
   collect_weights       Collects tensors containing a pretrained model weights
   collect_activations   Collects tensors for each datapoint using a pretrained model
   datasets              Downloads and lists Ludwig-ready datasets
   export_torchscript    Exports Ludwig models to Torchscript
   export_triton         Exports Ludwig models to Triton
   export_neuropod       Exports Ludwig models to Neuropod
   export_mlflow         Exports Ludwig models to MLflow
   preprocess            Preprocess data and saves it into HDF5 and JSON format
   synthesize_dataset    Creates synthetic data for testing purposes
   init_config           Initialize a user config from a dataset and targets
   render_config         Renders the fully populated config with all defaults set

ludwig cli runner

positional arguments:
  command     Subcommand to run

optional arguments:
  -h, --help  show this help message and exit
root@22b9afe42cc3:/data#

Describe the use case As a user, I want to have automl supported natively by CLI in order to quickly trigger a job. Right now, I have to load the dataset and write simple program to start the job like below.

import logging
import pprint

from load_util import load_mushroom_edibility
from ludwig.automl import auto_train

mushroom_edibility_df = load_mushroom_edibility()

auto_train_results = auto_train(
    dataset=mushroom_edibility_df,
    target='class',
    time_limit_s=7200,
    tune_for_memory=False
)

pprint.pprint(auto_train_results)

Describe the solution you'd like

ludwig automl --dataset xxx.csv --target "class" --time_limit_s=7200 --hyperopt=true --tune_for_memory=True

Describe alternatives you've considered N/A

Additional context N/A

Jeffwan avatar Jun 14 '22 06:06 Jeffwan

Thanks for raising this issue @Jeffwan. This should be relatively quick to implement, so we'll see if we can get it added for v0.6.

tgaddair avatar Jun 14 '22 16:06 tgaddair

I think this is a great idea -- similarly, I filed #1934 requesting the same. Including this in 0.6 SGTM.

justinxzhao avatar Jun 14 '22 16:06 justinxzhao