DeepSpeedExamples
DeepSpeedExamples copied to clipboard
Error when using BLOOMZ for reward model training
Hello, I‘m tring to use BLOOMZ for reward model training, and get error:
Traceback (most recent call last):
File "/users5/xydu/ChatGPT/DeepSpeed-Chat/training/step2_reward_model_finetuning/training_scripts/single_node/../../main.py", line 349, in <module>
main()
File "/users5/xydu/ChatGPT/DeepSpeed-Chat/training/step2_reward_model_finetuning/training_scripts/single_node/../../main.py", line 303, in main
reward_score, acc = evaluation_reward(rm_model, eval_dataloader)
File "/users5/xydu/ChatGPT/DeepSpeed-Chat/training/step2_reward_model_finetuning/training_scripts/single_node/../../main.py", line 249, in evaluation_reward
outputs = model(**batch)
File "/users5/xydu/anaconda3/envs/dpchat/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/users5/xydu/ChatGPT/DeepSpeed-Chat/training/utils/model/reward_model.py", line 97, in forward
return forward_call(*args, **kwargs)
File "/users5/xydu/anaconda3/envs/dpchat/lib/python3.9/site-packages/deepspeed/utils/nvtx.py", line 15, in wrapped_fn
ret_val = func(*args, **kwargs)
File "/users5/xydu/anaconda3/envs/dpchat/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 1695, in forward
loss = self.module(*inputs, **kwargs)
File "/users5/xydu/anaconda3/envs/dpchat/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
assert divergence_ind > 0, divergence_ind
AssertionError return forward_call(*args, **kwargs)
File "/users5/xydu/ChatGPT/DeepSpeed-Chat/training/utils/model/reward_model.py", line 97, in forward
assert divergence_ind > 0
After output divergence_ind I find it is 0 and change assert divergence_ind > 0 to assert divergence_ind >= 0, will this affect the program?
I also meet this problem, does it solved?
This problem is due to 560m & 7b1 bloomz model use left-padding by default, which is really weird :( You can change the padding style to right-padding to avoid this problem. BTW change ">" to ">=" will not affect the program. However, this program is designed for right-padding, so left-padding will lead to total wrongness.
@LuciusMos Thank you! By the way, for others using BLOOM, I advice add 1e-7 to difference of two sentences' reward, it will help you avoid inf loss in training process.
Reward model trainning success, but using rw_eval.py to eval reward model by this command python rw_eval.py --model_name_or_path reward_model/bloom-560m --num_padding_at_beginning 0 has this error:
OSError: Can't load tokenizer for 'reward_model/bloom-560m'. If you were trying to load it from 'https://huggingface.co/models', make sure you don't have a local directory with the same name. Otherwise, make sure 'reward_model/bloom-560m' is the correct path to a directory containing all relevant files for a BloomTokenizerFast tokenizer.
All files in reward_model/bloom-560m :
├── config.json
├── merges.txt
├── pytorch_model.bin
├── training.log
└── vocab.json
However, if choose opt model in step2, rw_eval.py works fine.
@cokuehuang Maybe you should upgrade transformers version
My transformers version is 4.29.0.dev0.
Maybe transformers/src/transformers/models/bloom/tokenization_bloom_fast.py needs VOCAB_FILES_NAMES = {"tokenizer_file": "tokenizer.json"} , but result of bloom trainnging in step2 has no this file. Howerver opt's VOCAB_FILES_NAMES = {"vocab_file": "vocab.json", "merges_file": "merges.txt", "tokenizer_file": "tokenizer.json"}.
This problem is due to 560m & 7b1 bloomz model use left-padding by default, which is really weird :( You can change the padding style to right-padding to avoid this problem. BTW change ">" to ">=" will not affect the program. However, this program is designed for right-padding, so left-padding will lead to total wrongness.
how ?
This problem is due to 560m & 7b1 bloomz model use left-padding by default, which is really weird :( You can change the padding style to right-padding to avoid this problem. BTW change ">" to ">=" will not affect the program. However, this program is designed for right-padding, so left-padding will lead to total wrongness.
how ?
@lc222 Just add padding_side="right" kwarg in the tokenizer init function.
For example: tokenizer = load_hf_tokenizer(args.model_name_or_path, fast_tokenizer=True, padding_side="right")
@LuciusMos Thank you! By the way, for others using BLOOM, I advice add 1e-7 to difference of two sentences' reward, it will help you avoid
infloss in training process.
I set the padding side to right and clamped the loss to avoid inf. The training can run without error, but it gives "Grad overflow" at every iteration. How did you solve that?
@LiinXemmon Hi, this is caused by log(0) which will return inf, I think you should a very small value to difference of two sentences' reward(like 1e-7), it will help you avoid inf loss in training process.
@LiinXemmon Hi, this is caused by log(0) which will return
inf, I think you should a very small value to difference of two sentences' reward(like 1e-7), it will help you avoidinfloss in training process.
Hi Luoyang, I have added 1e-7 to the reward_model.py file under the utils/model folder while it still faces the inf loss issue. When using zero_stage = 3, the loss scale will drop to the minimum (1 here) and raise the error immediately after starting training. Changing the zero_stage = 0 will also constantly show the Grad Overflow problem though it can be trained.
@LiinXemmon Hi, this is caused by log(0) which will return
inf, I think you should a very small value to difference of two sentences' reward(like 1e-7), it will help you avoidinfloss in training process.Hi Luoyang, I have added
1e-7to thereward_model.pyfile under theutils/modelfolder while it still faces theinfloss issue. When usingzero_stage = 3, the loss scale will drop to the minimum (1 here) and raise the error immediately after starting training. Changing thezero_stage = 0will also constantly show the Grad Overflow problem though it can be trained.
I solved "Grad overflow" by using bf16 rather than the default fp16. Adding 1e-7 to the reward_model.py file works for me to avoid inf loss. I modified the line as
loss += -torch.log( torch.sigmoid(c_truncated_reward - r_truncated_reward)+1e-7).mean()
@LiinXemmon Hi, this is caused by log(0) which will return
inf, I think you should a very small value to difference of two sentences' reward(like 1e-7), it will help you avoidinfloss in training process.Hi Luoyang, I have added
1e-7to thereward_model.pyfile under theutils/modelfolder while it still faces theinfloss issue. When usingzero_stage = 3, the loss scale will drop to the minimum (1 here) and raise the error immediately after starting training. Changing thezero_stage = 0will also constantly show the Grad Overflow problem though it can be trained.I solved "Grad overflow" by using
bf16rather than the defaultfp16. Adding1e-7to thereward_model.pyfile works for me to avoidinfloss. I modified the line asloss += -torch.log( torch.sigmoid(c_truncated_reward - r_truncated_reward)+1e-7).mean()
how to use bf16 rather than fp16?
@LiinXemmon Hi, this is caused by log(0) which will return
inf, I think you should a very small value to difference of two sentences' reward(like 1e-7), it will help you avoidinfloss in training process.Hi Luoyang, I have added
1e-7to thereward_model.pyfile under theutils/modelfolder while it still faces theinfloss issue. When usingzero_stage = 3, the loss scale will drop to the minimum (1 here) and raise the error immediately after starting training. Changing thezero_stage = 0will also constantly show the Grad Overflow problem though it can be trained.I solved "Grad overflow" by using
bf16rather than the defaultfp16. Adding1e-7to thereward_model.pyfile works for me to avoidinfloss. I modified the line asloss += -torch.log( torch.sigmoid(c_truncated_reward - r_truncated_reward)+1e-7).mean()how to use bf16 rather than fp16?
I changed fp16 to bf16 in this file: DeepSpeedExamples-master/applications/DeepSpeed-Chat/training/utils/ds_utils.py. Like this:
return {
"train_batch_size": GLOBAL_BATCH_SIZE,
"train_micro_batch_size_per_gpu": MICRO_BATCH_SIZE,
"steps_per_print": 10,
"zero_optimization": zero_opt_dict,
"bf16": { # changed from fp16 to bf16
"enabled": True,
"loss_scale_window": 100
},
"gradient_clipping": 1.0,
"prescale_gradients": False,
"wall_clock_breakdown": False,
"hybrid_engine": {
"enabled": enable_hybrid_engine,
"max_out_tokens": max_out_tokens,
"inference_tp_size": inference_tp_size,
"release_inference_cache": release_inference_cache,
"pin_parameters": pin_parameters,
"tp_gather_partition_size": tp_gather_partition_size,
}
However, I am not sure whether this is THE right way to do it.
@LiinXemmon Hi, this is caused by log(0) which will return
inf, I think you should a very small value to difference of two sentences' reward(like 1e-7), it will help you avoidinfloss in training process.Hi Luoyang, I have added
1e-7to thereward_model.pyfile under theutils/modelfolder while it still faces theinfloss issue. When usingzero_stage = 3, the loss scale will drop to the minimum (1 here) and raise the error immediately after starting training. Changing thezero_stage = 0will also constantly show the Grad Overflow problem though it can be trained.I solved "Grad overflow" by using
bf16rather than the defaultfp16. Adding1e-7to thereward_model.pyfile works for me to avoidinfloss. I modified the line asloss += -torch.log( torch.sigmoid(c_truncated_reward - r_truncated_reward)+1e-7).mean()
great job