ColossalAI icon indicating copy to clipboard operation
ColossalAI copied to clipboard

[BUG]: Hybrid kernel went wrong when there were both fp16 and fp32 gradients

Open 1SAA opened this issue 2 years ago • 0 comments

🐛 Describe the bug

When model has both fp16 gradient and fp32 gradient, hybrid adam may unable to update parameters correctly. Since we put all parameters to a list in colossalai/nn/optimizer/hybrid_adam.py. Here it is. image But in our CUDA kernel, writes like this. image We only check the dtype of the first element in the list. This may cause wrong dtype.

Environment

No response

1SAA avatar Apr 06 '22 04:04 1SAA