Checkpointing at every model tuning stage

Open Abhilash2000 opened this issue 3 years ago • 1 comments

Hello,

Is there a way we can checkpoint and save the model at every stage in the AutoML training process? If not, where in the code can we modify it?

Thanks.

Aug 23 '22 05:08 Abhilash2000

A couple of thoughts:

We save each configuration in a log file when log_file_name is specified. Then the corresponding configuration can be retrained afterwards using AutoML.retrain_from_log().
When mlflow is installed and the AutoML.fit() is within a mlflow run, we also use mlflow to log the configurations and metrics of each trial. That can be extended to log the trained model if that's desired. https://github.com/microsoft/FLAML/blob/e5c8a16fabc74098751598950911d7b319f0573b/flaml/automl.py#L3069

Relevant doc: https://microsoft.github.io/FLAML/docs/Use-Cases/Task-Oriented-AutoML#log-the-trials

Aug 23 '22 16:08 sonichi