Alex Nichol
Alex Nichol
Could you post code that reproduces the issue? Preferable provide an image that causes the failure as well.
I hit the bug that this fixes. It drove me nuts for days and was tough to find. Please merge this!
Can you post the entire stack trace / code that you are using to produce this issue?
This is unofficial, but some people have found it possible to achieve this by randomizing the class label at every timestep. Example: https://twitter.com/RiversHaveWings/status/1423034386354561024
Hi Shoufa, It's quite hard to actually utilize every FLOP available on the GPU. When you run a command like `nvidia-smi` and it claims you are at 100%, that does...
Hi Shoufa, Could you please send the exact command you are running for training? This is indeed a NaN during the forward pass (hence losses are NaN), which looks like...
Do you have a record of the loss before the NaN occurred? Did it spike right before NaNs started happening? Your command itself looks good to me, so I don't...
Perhaps this bug is related to the issue here: https://github.com/openai/guided-diffusion/issues/44 If so, perhaps we could try patching that bug and see if the error resolves itself. The patch would involve...
Reptile is definitely simple to scale across multiple machines, since each machine just has to run a separate inner loop and then average the parameters at the end. One thing...
Thanks for reporting this. Are some of the ImageNet images valid? If so, is there a general pattern as to which ones are empty? I'm not sure what could cause...