diffusers
diffusers copied to clipboard
Add training options
Continue training existing model
Modified train_unconditional so the model is reloaded if it exists:
export COMMAND="python examples/train_unconditional.py --resolution 32 --num_epochs 10 --train_data_dir training-images --output_dir model"
% ${COMMAND}
creating fresh model
....
% ${COMMAND}
reloading model from model/unet
reloaded model from model/unet
....
Checkpoint periodically on model save
Checkpoint a copy of the model on save: % ${COMMAND} --checkpoint_model_epochs 10 ... checkpointed model/unet to model/checkpoints/checkpoint-2022-08-18+17-17-48 ...
The arguments should be n * save_model_epochs ; where n is an integer >0 or 0 to disable (default)
% tree model/checkpoints
model/checkpoints
└── checkpoint-2022-08-18+17-17-48
├── config.json
└── diffusion_pytorch_model.bin
1 directory, 2 files
Timestamp test_samples
The test_samples are also timestamped to allow visual inspection over time:
% ${COMMAND} --timestamp_test_samples
Show up with names like:
- test_samples-2022-08-18+17-49-52+000000
- test_samples-2022-08-18+17-55-34+000009
Script for generating images:
% ./scripts/generate_images.py model 3
modification time on model is 2022-08-18+17-17-48
loading the model from model
loaded the model from model
creating image and saving to generated/model/2022-08-18+17-17-48/image-0000.png
100%|####| 1000/1000 [00:16<00:00, 60.02it/s]
image saved to generated/model/2022-08-18+17-17-48/image-0000.png
creating image and saving to generated/model/2022-08-18+17-17-48/image-0001.png
100%|####| 1000/1000 [00:17<00:00, 65.98it/s]
image saved to generated/model/2022-08-18+17-17-48/image-0001.png
creating image and saving to generated/model/2022-08-18+17-17-48/image-0002.png
100%|####| 1000/1000 [00:17<00:00, 65.31it/s]
image saved to generated/model/2022-08-18+17-17-48/image-0002.png
writing html to generated/model/2022-08-18+17-17-48/images.html
wrote html to generated/model/2022-08-18+17-17-48/images.html
The directory name is based off of the model and timestamp of it's directory.
Successive runs will not clobber existing files but skip over them.
@anton-l could you check here?
I have not worked with transformers before and implemented checkpoints a little differently than I see in run_glue_no_trainer.py and implemented it like so
In the interests of consistency it might make sense to redo the entire checkpoint / reload mechanism along the lines of run_glue_no_trainer
I'll happily mirror that logic if it will save you time so you can work on other things, but if you want to implement it yourself so it completely matches the project gestalt, I completely understand.
Hey @luckybit4755 sorry for the late reply! Yeah, feel free to copy the logic over, or I can include the changes into your PR if you don't mind :)
Yeah, that's totally fine if you wouldn't mind. Sorry been a bit distracted as well.
Do we want to copy the "resume" training logic here or leave for now? cc @anton-l
@anton-l closing this PR for now since there is no response
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
Please note that issues that do not follow the contributing guidelines are likely to be ignored.