Open-Sora icon indicating copy to clipboard operation
Open-Sora copied to clipboard

the gradient of all parameters is None

Open nankepan opened this issue 1 year ago • 3 comments
trafficstars

image Hi, I print param.grad here and find that the gradient of all parameters is None. Is this caused by using colorsalai? How can I obtain the gradient of parameters? Thank you.

nankepan avatar Apr 16 '24 14:04 nankepan

It should only be None after optimizer.zero_grad(); booster.backward was doing torch.optim.Optimizer.backward(loss). Would you mind printing the contents of loss to see if it is NaN?

JThh avatar Apr 16 '24 16:04 JThh

It should only be None after optimizer.zero_grad(); booster.backward was doing torch.optim.Optimizer.backward(loss). Would you mind printing the contents of loss to see if it is NaN?

Thanks for reply. loss is normal, but gradient is None before optimizer.zero_grad(), which is strange. I trained the model, loss was steadily decreasing and the model performance was also improving. But the gradient None makes me confused.

nankepan avatar Apr 17 '24 02:04 nankepan

This is because Colossalai manages the gradients, so you cannot directly access them by param.grad. @ver217 Could you please help with this?

zhengzangw avatar May 10 '24 06:05 zhengzangw

Hi, gradients is managed in zero optimizer and p.grad is None. This is expected behavior. If you want to check grads manually, you can refer to https://github.com/hpcaitech/ColossalAI/blob/7f8b16635b42013b73e1cb1ffdebc07b4d71ac93/tests/test_zero/test_low_level/test_zero1_2.py#L164 Note that the grad is sharded and flat.

ver217 avatar Jun 24 '24 07:06 ver217

I also need to extract p.grad for subsequent calculations. Is there any way to get p.grad correctly? I have read the above code but still don't know how to do it.

Hi, gradients is managed in zero optimizer and p.grad is None. This is expected behavior. If you want to check grads manually, you can refer to https://github.com/hpcaitech/ColossalAI/blob/7f8b16635b42013b73e1cb1ffdebc07b4d71ac93/tests/test_zero/test_low_level/test_zero1_2.py#L164 Note that the grad is sharded and flat.

281LinChenjian avatar Jul 22 '24 07:07 281LinChenjian