fast-stable-diffusion icon indicating copy to clipboard operation
fast-stable-diffusion copied to clipboard

ckpt not copied to session folder

Open SAC020 opened this issue 2 years ago • 13 comments

Hi,

It's the second time today the ckpt was not copied into the session folder when the session was auto-disconnected when completed (1 hour ago)

image

image

It is not a matter of space

image

SAC020 avatar Dec 09 '22 18:12 SAC020

check again by training a sample model with only 30 steps, and see if it gets saved

TheLastBen avatar Dec 09 '22 19:12 TheLastBen

image

Re-connected and re-run the exact same sequence (for 30 steps), it did save the ckpt. I'm not sure what this proves, though

SAC020 avatar Dec 09 '22 21:12 SAC020

it means that there was an error that caused the training to stop and consequently the runtime to disconnect

TheLastBen avatar Dec 09 '22 21:12 TheLastBen

ok, what sort of error? It was a 4hr session, I did monitor the session sporadically, it was stable / not running out of RAM/VRAM/disk, I am on Pro+ so it shouldn't disconnect when idle, I re-run the exact same sequence, same baseline ckpt, same images, it did finish with "Done, ckpt is in your folder"... don't understand what sort of error

Perhaps better to not clear up the visible log when displaying "done", maybe there was a previous error message?

SAC020 avatar Dec 09 '22 21:12 SAC020

could be a gdrive error too

TheLastBen avatar Dec 09 '22 21:12 TheLastBen

same just happened to me, but I watched it hit 100% and complete without errors.

dalmackay avatar Dec 09 '22 23:12 dalmackay

make sure you have enough gdrive space, or clear the cookies from time to time

TheLastBen avatar Dec 09 '22 23:12 TheLastBen

For the time being I have changed the sleep(2) to sleep(20). So far I've had no more missing ckpts

SAC020 avatar Dec 11 '22 15:12 SAC020

Great, I'll fix it

TheLastBen avatar Dec 11 '22 16:12 TheLastBen

I have the same issue:

Converting to CKPT... Killed Done, Resuming training...

No CKPT file in drive. There is plenty of space in drive.

pranavmehta avatar Dec 11 '22 20:12 pranavmehta

that's a RAM issue, but the final CKPT will get saved

TheLastBen avatar Dec 12 '22 06:12 TheLastBen

Great, I'll fix it

You have changed only the second "sleep", not the first one

SAC020 avatar Dec 12 '22 09:12 SAC020

yes, I forgot the v2, thanks

TheLastBen avatar Dec 12 '22 09:12 TheLastBen