nerfies
nerfies copied to clipboard
Training hangs indefinitely
I am trying to train a nerfie using 8 GPUs, but training hangs at some early steps (about 1-5 k) every time. I tried to decrease batch size, but it only delays the freezing moment. So there must be a kind of memory leak that causes the problem. It does not happen if I use only CPU, of course. I did not change your code. Did anyone report similar problem?