diffusers icon indicating copy to clipboard operation
diffusers copied to clipboard

train_dreambooth.py: Option to delete the previous step saves when using --save_interval to save google drive space.

Open nbanyan opened this issue 1 year ago • 4 comments

Storage space can be an issue when training models, especially when using a Google collab and saving the model to a google drive so it isn't lost when the collab disconnects. It would help if there was an option to delete the previous step saves (or only keep the most recent N saves).

Proposed Example:

accelerate launch train_dreambooth.py \
...
  --num_class_images=50 \
  --sample_batch_size=4 \
  --keep_last_n_saves=2 \
  --max_train_steps=15000 \
  --save_interval=500

Would save the model every 500 steps, but only keep the two most recent models, so that upon completion only a 14500 and a 15000 model would be in google drive.

nbanyan avatar Nov 15 '22 14:11 nbanyan

cc @patil-suraj

patrickvonplaten avatar Nov 17 '22 16:11 patrickvonplaten

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

github-actions[bot] avatar Dec 15 '22 15:12 github-actions[bot]

I think this should be resolved by https://github.com/huggingface/diffusers/pull/1668

patrickvonplaten avatar Dec 19 '22 16:12 patrickvonplaten

I think this should be resolved by #1668

Not yet. We still have to leverage accelerate to do it if it's the latest version. I'll fix it soon.

pcuenca avatar Dec 20 '22 07:12 pcuenca

Gently ping here @pcuenca

patrickvonplaten avatar Jan 16 '23 11:01 patrickvonplaten

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

github-actions[bot] avatar Feb 09 '23 15:02 github-actions[bot]

Is this PR still open? Maybe @williamberman you could take a look if you're too busy @pcuenca ? :-)

patrickvonplaten avatar Feb 13 '23 11:02 patrickvonplaten

@nbanyan the cli arg --checkpoints_total_limit will now limit the total number of checkpoints saved. Sorry for the delay here :)

williamberman avatar Feb 16 '23 08:02 williamberman