DiffBIR
DiffBIR copied to clipboard
Failure during training
I started training on A100 GPU with about 2000 training images. It completed about 900 Epochs, then the process ended abruptly without any errors. I can see several checkpoint step
files.
I also tried to restart the traning by setting the resume
path to the folder containing step
files. But gives error {folder} is a directory
.
Any help would be highly appreciated.
Thanks