DiffuSeq
DiffuSeq copied to clipboard
'grad_norm' is NaN
Hi,
When it comes to training step, the problem arises with 'grad_norm' of NaN. I used diffuSeq-v2 and used FP16 for GPU acceleration. Where is the problem and how can it be fixed.Thank you!
It is suggested to use gradient monitoring and logging during training to identify the layer(s) or operation(s) causing the problem.
Thank you a lot. I will try~
I'm having the same problem, have you solved it?