Open-Sora
Open-Sora copied to clipboard
the gradient of all parameters is None
Hi,
I print param.grad here and find that the gradient of all parameters is None. Is this caused by using colorsalai? How can I obtain the gradient of parameters? Thank you.
It should only be None after optimizer.zero_grad(); booster.backward was doing torch.optim.Optimizer.backward(loss). Would you mind printing the contents of loss to see if it is NaN?
It should only be
Noneafteroptimizer.zero_grad();booster.backwardwas doingtorch.optim.Optimizer.backward(loss). Would you mind printing the contents oflossto see if it isNaN?
Thanks for reply. loss is normal, but gradient is None before optimizer.zero_grad(), which is strange.
I trained the model, loss was steadily decreasing and the model performance was also improving. But the gradient None makes me confused.
This is because Colossalai manages the gradients, so you cannot directly access them by param.grad. @ver217 Could you please help with this?
Hi, gradients is managed in zero optimizer and p.grad is None. This is expected behavior. If you want to check grads manually, you can refer to https://github.com/hpcaitech/ColossalAI/blob/7f8b16635b42013b73e1cb1ffdebc07b4d71ac93/tests/test_zero/test_low_level/test_zero1_2.py#L164
Note that the grad is sharded and flat.
I also need to extract p.grad for subsequent calculations. Is there any way to get p.grad correctly? I have read the above code but still don't know how to do it.
Hi, gradients is managed in zero optimizer and
p.gradis None. This is expected behavior. If you want to check grads manually, you can refer to https://github.com/hpcaitech/ColossalAI/blob/7f8b16635b42013b73e1cb1ffdebc07b4d71ac93/tests/test_zero/test_low_level/test_zero1_2.py#L164 Note that the grad is sharded and flat.