lightning-ColossalAI
lightning-ColossalAI copied to clipboard
Error occurred when input data is of Float32
🐛 Bug
I have specified precision=16 and strategy="colossalai" in Trainer. But error occurred when my input data is of float32 and the error message is that "RuntimeError: Input type (torch.cuda.FloatTensor) and weight type (torch.cuda.HalfTensor) should be the same". I found that when training, the model parameters were all cast to float16, while my input is of float32.
But, should it be an issue? I kept all setup unchanged but removed strategy="colossalai", and I find that the dtype of model parameters shown on my debug UI is float32. Then everything goes well. Later I tried to make all input as float16 and use colossalai strategy, it went well, too.
So, is it necessary to cast all my input to float16 to use strategy='colossalai'? I think it shouldn't be that but the bug I came across does make me wonder.
Environment
python 3.9.12 torch 1.12.1 lightning 2.0 colossalai 0.2.5 lightning-colossalai 0.1.0rc1 cuda 11.3 ubuntu 20.04