h2o-llmstudio icon indicating copy to clipboard operation
h2o-llmstudio copied to clipboard

[FEATURE] Option for not saving checkpoint

Open psinger opened this issue 1 year ago • 1 comments

🚀 Feature

Would be helpful to have a setting to disable saving the checkpoint, such as for tests or benchmark runs to not fill up local disk.

Specifically useful for CLI, might also consider having it in Wave as an option.

psinger avatar Apr 29 '24 12:04 psinger

I actually implemented this on our h2o instance, it's super helpful for doing hyperparameter sweeps where you don't want the time and storage overhead of materializing the models.

Another helpful setting might be to only save the last checkpoint, as this allows frequent evaluation runs to keep track of training progress without the time overhead of checkpointing the model (which is particularly high when running on multiple GPUs with Deepspeed).

tmostak avatar Apr 29 '24 20:04 tmostak