zero123
zero123 copied to clipboard
Zero123 not working on A40 GPU 46GB ram
Hello authors,
I'm running the training script which is the main.py file. I have 2 A40 GPUs each having 46GB of memory. I reduced the batch size to even 1. When I set the num_workers to 0, the code just abruptly stops at Epoch 0. If I set it to some value, then it throws the error of "RuntimeError('DataLoader worker (pid(s) {}) exited unexpectedly'.format(pids_str)) from e RuntimeError: DataLoader worker (pid(s) 32823, 32919) exited unexpectedly".
I saw the solution to this online and people said to put num_workers to 0 but that again doesn't solve the problem as stated above. Can you please tell what is the issue?
Edit - Value of parameter accumulate_grad_batches = 1 in my case. Should I change it to 4?
I am getting this error too -- any idea why did that happen?
me too, I am also getting this error -- anyone know how to fix it?
have you fix it?