lightning-ColossalAI icon indicating copy to clipboard operation
lightning-ColossalAI copied to clipboard

Error occurred when input data is of Float32

Open lizhiqi49 opened this issue 1 year ago • 2 comments

🐛 Bug

I have specified precision=16 and strategy="colossalai" in Trainer. But error occurred when my input data is of float32 and the error message is that "RuntimeError: Input type (torch.cuda.FloatTensor) and weight type (torch.cuda.HalfTensor) should be the same". I found that when training, the model parameters were all cast to float16, while my input is of float32.

But, should it be an issue? I kept all setup unchanged but removed strategy="colossalai", and I find that the dtype of model parameters shown on my debug UI is float32. Then everything goes well. Later I tried to make all input as float16 and use colossalai strategy, it went well, too.

So, is it necessary to cast all my input to float16 to use strategy='colossalai'? I think it shouldn't be that but the bug I came across does make me wonder.

Environment

python 3.9.12 torch 1.12.1 lightning 2.0 colossalai 0.2.5 lightning-colossalai 0.1.0rc1 cuda 11.3 ubuntu 20.04

lizhiqi49 avatar Mar 28 '23 13:03 lizhiqi49