Paella
Paella copied to clipboard
grad exposure
i train paella on MSCOCO, and downsize a little bit paella, to 247M parameters. But the training loss suddenly increases, and then to nan. wonder how to solve this problem.