DiffuSeq icon indicating copy to clipboard operation
DiffuSeq copied to clipboard

'grad_norm' is NaN

Open LikeStarting opened this issue 1 year ago • 3 comments

Hi, When it comes to training step, the problem arises with 'grad_norm' of NaN. I used diffuSeq-v2 and used FP16 for GPU acceleration. Where is the problem and how can it be fixed.Thank you! 1715308130240 image

LikeStarting avatar May 10 '24 02:05 LikeStarting

It is suggested to use gradient monitoring and logging during training to identify the layer(s) or operation(s) causing the problem.

summmeer avatar May 30 '24 07:05 summmeer

Thank you a lot. I will try~

LikeStarting avatar Jun 02 '24 05:06 LikeStarting

I'm having the same problem, have you solved it?

X-fxx avatar Jul 19 '24 02:07 X-fxx