DiffuSeq 'grad_norm' is NaN

'grad_norm' is NaN

Open LikeStarting opened this issue 1 year ago • 3 comments

Hi, When it comes to training step, the problem arises with 'grad_norm' of NaN. I used diffuSeq-v2 and used FP16 for GPU acceleration. Where is the problem and how can it be fixed.Thank you！ 1715308130240

May 10 '24 02:05 LikeStarting

It is suggested to use gradient monitoring and logging during training to identify the layer(s) or operation(s) causing the problem.

May 30 '24 07:05 summmeer

Thank you a lot. I will try~

Jun 02 '24 05:06 LikeStarting

I'm having the same problem, have you solved it?

Jul 19 '24 02:07 X-fxx

DiffuSeq DiffuSeq copied to clipboard

'grad_norm' is NaN

DiffuSeq
DiffuSeq copied to clipboard