Open-Sora the gradient of all parameters is None

trafficstars

Hi, I print param.grad here and find that the gradient of all parameters is None. Is this caused by using colorsalai? How can I obtain the gradient of parameters? Thank you.

Apr 16 '24 14:04 nankepan

It should only be None after optimizer.zero_grad(); booster.backward was doing torch.optim.Optimizer.backward(loss). Would you mind printing the contents of loss to see if it is NaN?

Apr 16 '24 16:04 JThh

It should only be None after optimizer.zero_grad(); booster.backward was doing torch.optim.Optimizer.backward(loss). Would you mind printing the contents of loss to see if it is NaN?

Thanks for reply. loss is normal, but gradient is None before optimizer.zero_grad(), which is strange. I trained the model, loss was steadily decreasing and the model performance was also improving. But the gradient None makes me confused.

Apr 17 '24 02:04 nankepan

This is because Colossalai manages the gradients, so you cannot directly access them by param.grad. @ver217 Could you please help with this?

May 10 '24 06:05 zhengzangw

Hi, gradients is managed in zero optimizer and p.grad is None. This is expected behavior. If you want to check grads manually, you can refer to https://github.com/hpcaitech/ColossalAI/blob/7f8b16635b42013b73e1cb1ffdebc07b4d71ac93/tests/test_zero/test_low_level/test_zero1_2.py#L164 Note that the grad is sharded and flat.

Jun 24 '24 07:06 ver217

I also need to extract p.grad for subsequent calculations. Is there any way to get p.grad correctly? I have read the above code but still don't know how to do it.

Hi, gradients is managed in zero optimizer and p.grad is None. This is expected behavior. If you want to check grads manually, you can refer to https://github.com/hpcaitech/ColossalAI/blob/7f8b16635b42013b73e1cb1ffdebc07b4d71ac93/tests/test_zero/test_low_level/test_zero1_2.py#L164 Note that the grad is sharded and flat.

Jul 22 '24 07:07 281LinChenjian

Open-Sora Open-Sora copied to clipboard

the gradient of all parameters is None

Open-Sora
Open-Sora copied to clipboard