Diff-UNet icon indicating copy to clipboard operation
Diff-UNet copied to clipboard

Stuck in an epoch

Open XFivezzz opened this issue 1 year ago • 1 comments

When I apply this model to the verse2020 dataset, I get stuck at the ninth epoch every time (it will directly terminate the prompt RuntimeError: DataLoader worker (pid 9063) is killed by signal: killed) When I change the higher performance GPU and CPU, adjust the learning rate and batch, etc., I still get stuck at the ninth epoch, showing that it takes ten hours 42fe82b1178e5512182987a1c8228e6

XFivezzz avatar Jul 25 '23 10:07 XFivezzz

Your validation data is so big, you can only validate only a section of all validation data. And you also can modify the DDIM sample step from 10 to 2, which can also improve the speed of inference.

920232796 avatar Jul 25 '23 10:07 920232796