stylegan-xl icon indicating copy to clipboard operation
stylegan-xl copied to clipboard

How to restart training?

Open Askejm opened this issue 2 years ago • 5 comments

I want to train a bunch of stems. I figured I could do this with the --restart_every argument. So I set it to 58k secs but when it reaches that it just exits 14CB8956-E15E-49DC-AF1E-93BD3FA5155A How would I make it restart by itself? I want it to make a bunch of different stems that are 1k kimg. Windows 10, rtx 3070

Askejm avatar Jul 21 '22 10:07 Askejm

when it reaches that it just exits

https://github.com/autonomousvision/stylegan_xl/blob/223430d0adb5eabee9f7e38e0bce73fc4b1818f6/training/training_loop.py#L133

woctezuma avatar Jul 21 '22 20:07 woctezuma

Could you elaborate on what you mean with that?

Askejm avatar Jul 22 '22 06:07 Askejm

The argument of --restart_every is the time interval in seconds to exit code. So this works as intended.

What is stopping you from starting the program again after it exited? Is it supposed to automatically restart?

https://github.com/autonomousvision/stylegan_xl/blob/223430d0adb5eabee9f7e38e0bce73fc4b1818f6/training/training_loop.py#L174-L177

Be careful, because it seems to resume from the last checkpoint when you manually restart the program.

Maybe just set the training duration another way, and start the training from scratch a bunch of times.

woctezuma avatar Jul 22 '22 09:07 woctezuma

Yeah I noticed it resumes. But would you know a way? I might just open a bunch of anaconda prompts and have them offset by 58k secs. I can't always manually restart it because I'm on holiday

Askejm avatar Jul 22 '22 13:07 Askejm

I might just open a bunch of anaconda prompts and have them offset by 58k secs.

That is what I would do as well.

woctezuma avatar Jul 22 '22 14:07 woctezuma