ColossalAI
ColossalAI copied to clipboard
[BUG]: The training code of reward model may be wrong
π Describe the bug
I'm tring to train a reward model with example, but after ten epochs training its eval result still get dist=nan, acc=0
.
Is there any wrong in training code?
Environment
installed with here
I find the bug, the loss function may be wrong. Here is the reward and loss in training process:
chosen reward tensor([-0.0287], device='cuda:0', dtype=torch.float16, | 0/100 [00:00<?, ?it/s]
grad_fn=<SqueezeBackward1>)
reject reward tensor([-0.0312], device='cuda:0', dtype=torch.float16,
grad_fn=<SqueezeBackward1>)
loss tensor(0.6924, device='cuda:0', dtype=torch.float16, grad_fn=<MeanBackward0>)
Once you backward a step, the score and loss will be nan:
chosen reward tensor([nan], device='cuda:0', dtype=torch.float16, grad_fn=<SqueezeBackward1>) | 1/100 [00:01<03:15, 1.97s/it, dist=0, acc=0]
reject reward tensor([nan], device='cuda:0', dtype=torch.float16, grad_fn=<SqueezeBackward1>)
loss tensor(nan, device='cuda:0', dtype=torch.float16, grad_fn=<MeanBackward0>)
Any idea about how to solve this problem?
Same problem.
Bot detected the issue body's language is not English, translate it automatically. π―ππ»π§βπ€βπ§π«π§πΏβπ€βπ§π»π©πΎβπ€βπ¨πΏπ¬πΏ
Same problem.
try use other strategies, like colossalai_zero2
Same problem.
try use other strategies, like
colossalai_zero2
@HuangLK I tried other strategies, but the problem still exists. Why do you think other strategies would solve this problem? Thanks~
I find the problem(maybe), if you delete model = model.to(torch.float16)
in the python file you can get a normal loss number, but the accuracy and distance will just slightly change in training process. I don't know whether it can solve the problem.
I find the problem(maybe), if you delete
model = model.to(torch.float16)
in the python file you can get a normal loss number, but the accuracy and distance will just slightly change in training process. I don't know whether it can solve the problem.
@Luoyang144 Thank you so much for the information. Deleting this line applies to me as well. My [dist, acc] increased from [0.01, 0.60] to [0.45, 0.66]. I used "ddp" strategy and trained on 8 GPUs.
@LuciusMos Thanks for sharing!
Hi @Luoyang144 @LuciusMos @MyHerbTea @YiAthena After verification, this is not a bug caused by the code, but an inappropriate sh command. We have fixed it. Thanks. https://github.com/hpcaitech/ColossalAI/blob/main/applications/Chat/examples/train_rm.sh
@LuciusMos Thanks for sharing!
Hello, do you finally solve the problem? I use the newest sh command , but the problem (dist=nan, acc=0) still exists.