h2o-llmstudio
h2o-llmstudio copied to clipboard
[FEATURE] Option for not saving checkpoint
🚀 Feature
Would be helpful to have a setting to disable saving the checkpoint, such as for tests or benchmark runs to not fill up local disk.
Specifically useful for CLI, might also consider having it in Wave as an option.
I actually implemented this on our h2o instance, it's super helpful for doing hyperparameter sweeps where you don't want the time and storage overhead of materializing the models.
Another helpful setting might be to only save the last checkpoint, as this allows frequent evaluation runs to keep track of training progress without the time overhead of checkpointing the model (which is particularly high when running on multiple GPUs with Deepspeed).