DeepSpeedExamples icon indicating copy to clipboard operation
DeepSpeedExamples copied to clipboard

Error when using BLOOMZ for reward model training

Open Luoyang144 opened this issue 1 year ago • 1 comments

Hello, I‘m tring to use BLOOMZ for reward model training, and get error:

Traceback (most recent call last):
  File "/users5/xydu/ChatGPT/DeepSpeed-Chat/training/step2_reward_model_finetuning/training_scripts/single_node/../../main.py", line 349, in <module>
    main()
  File "/users5/xydu/ChatGPT/DeepSpeed-Chat/training/step2_reward_model_finetuning/training_scripts/single_node/../../main.py", line 303, in main
    reward_score, acc = evaluation_reward(rm_model, eval_dataloader)
  File "/users5/xydu/ChatGPT/DeepSpeed-Chat/training/step2_reward_model_finetuning/training_scripts/single_node/../../main.py", line 249, in evaluation_reward
    outputs = model(**batch)
  File "/users5/xydu/anaconda3/envs/dpchat/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/users5/xydu/ChatGPT/DeepSpeed-Chat/training/utils/model/reward_model.py", line 97, in forward
    return forward_call(*args, **kwargs)
  File "/users5/xydu/anaconda3/envs/dpchat/lib/python3.9/site-packages/deepspeed/utils/nvtx.py", line 15, in wrapped_fn
    ret_val = func(*args, **kwargs)
  File "/users5/xydu/anaconda3/envs/dpchat/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 1695, in forward
    loss = self.module(*inputs, **kwargs)
  File "/users5/xydu/anaconda3/envs/dpchat/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    assert divergence_ind > 0, divergence_ind
AssertionError    return forward_call(*args, **kwargs)
  File "/users5/xydu/ChatGPT/DeepSpeed-Chat/training/utils/model/reward_model.py", line 97, in forward
    assert divergence_ind > 0

After output divergence_ind I find it is 0 and change assert divergence_ind > 0 to assert divergence_ind >= 0, will this affect the program?

Luoyang144 avatar Apr 18 '23 02:04 Luoyang144

I also meet this problem, does it solved?

cokuehuang avatar Apr 23 '23 03:04 cokuehuang

This problem is due to 560m & 7b1 bloomz model use left-padding by default, which is really weird :( You can change the padding style to right-padding to avoid this problem. BTW change ">" to ">=" will not affect the program. However, this program is designed for right-padding, so left-padding will lead to total wrongness.

LuciusMos avatar Apr 25 '23 13:04 LuciusMos

@LuciusMos Thank you! By the way, for others using BLOOM, I advice add 1e-7 to difference of two sentences' reward, it will help you avoid inf loss in training process.

Luoyang144 avatar Apr 26 '23 05:04 Luoyang144

Reward model trainning success, but using rw_eval.py to eval reward model by this command python rw_eval.py --model_name_or_path reward_model/bloom-560m --num_padding_at_beginning 0 has this error: OSError: Can't load tokenizer for 'reward_model/bloom-560m'. If you were trying to load it from 'https://huggingface.co/models', make sure you don't have a local directory with the same name. Otherwise, make sure 'reward_model/bloom-560m' is the correct path to a directory containing all relevant files for a BloomTokenizerFast tokenizer. All files in reward_model/bloom-560m : ├── config.json ├── merges.txt ├── pytorch_model.bin ├── training.log └── vocab.json However, if choose opt model in step2, rw_eval.py works fine.

cokuehuang avatar Apr 26 '23 05:04 cokuehuang

@cokuehuang Maybe you should upgrade transformers version

Luoyang144 avatar Apr 26 '23 05:04 Luoyang144

My transformers version is 4.29.0.dev0.

cokuehuang avatar Apr 26 '23 05:04 cokuehuang

Maybe transformers/src/transformers/models/bloom/tokenization_bloom_fast.py needs VOCAB_FILES_NAMES = {"tokenizer_file": "tokenizer.json"} , but result of bloom trainnging in step2 has no this file. Howerver opt's VOCAB_FILES_NAMES = {"vocab_file": "vocab.json", "merges_file": "merges.txt", "tokenizer_file": "tokenizer.json"}.

cokuehuang avatar Apr 26 '23 07:04 cokuehuang

This problem is due to 560m & 7b1 bloomz model use left-padding by default, which is really weird :( You can change the padding style to right-padding to avoid this problem. BTW change ">" to ">=" will not affect the program. However, this program is designed for right-padding, so left-padding will lead to total wrongness.

how ?

lc222 avatar May 02 '23 13:05 lc222

This problem is due to 560m & 7b1 bloomz model use left-padding by default, which is really weird :( You can change the padding style to right-padding to avoid this problem. BTW change ">" to ">=" will not affect the program. However, this program is designed for right-padding, so left-padding will lead to total wrongness.

how ?

@lc222 Just add padding_side="right" kwarg in the tokenizer init function. For example: tokenizer = load_hf_tokenizer(args.model_name_or_path, fast_tokenizer=True, padding_side="right")

LuciusMos avatar May 04 '23 07:05 LuciusMos

@LuciusMos Thank you! By the way, for others using BLOOM, I advice add 1e-7 to difference of two sentences' reward, it will help you avoid inf loss in training process.

I set the padding side to right and clamped the loss to avoid inf. The training can run without error, but it gives "Grad overflow" at every iteration. How did you solve that?

LiinXemmon avatar May 04 '23 09:05 LiinXemmon

@LiinXemmon Hi, this is caused by log(0) which will return inf, I think you should a very small value to difference of two sentences' reward(like 1e-7), it will help you avoid inf loss in training process.

Luoyang144 avatar May 05 '23 09:05 Luoyang144

@LiinXemmon Hi, this is caused by log(0) which will return inf, I think you should a very small value to difference of two sentences' reward(like 1e-7), it will help you avoid inf loss in training process.

Hi Luoyang, I have added 1e-7 to the reward_model.py file under the utils/model folder while it still faces the inf loss issue. When using zero_stage = 3, the loss scale will drop to the minimum (1 here) and raise the error immediately after starting training. Changing the zero_stage = 0 will also constantly show the Grad Overflow problem though it can be trained.

lukaswangbk avatar May 05 '23 14:05 lukaswangbk

@LiinXemmon Hi, this is caused by log(0) which will return inf, I think you should a very small value to difference of two sentences' reward(like 1e-7), it will help you avoid inf loss in training process.

Hi Luoyang, I have added 1e-7 to the reward_model.py file under the utils/model folder while it still faces the inf loss issue. When using zero_stage = 3, the loss scale will drop to the minimum (1 here) and raise the error immediately after starting training. Changing the zero_stage = 0 will also constantly show the Grad Overflow problem though it can be trained.

I solved "Grad overflow" by using bf16 rather than the default fp16. Adding 1e-7 to the reward_model.py file works for me to avoid inf loss. I modified the line as loss += -torch.log( torch.sigmoid(c_truncated_reward - r_truncated_reward)+1e-7).mean()

LiinXemmon avatar May 05 '23 15:05 LiinXemmon

@LiinXemmon Hi, this is caused by log(0) which will return inf, I think you should a very small value to difference of two sentences' reward(like 1e-7), it will help you avoid inf loss in training process.

Hi Luoyang, I have added 1e-7 to the reward_model.py file under the utils/model folder while it still faces the inf loss issue. When using zero_stage = 3, the loss scale will drop to the minimum (1 here) and raise the error immediately after starting training. Changing the zero_stage = 0 will also constantly show the Grad Overflow problem though it can be trained.

I solved "Grad overflow" by using bf16 rather than the default fp16. Adding 1e-7 to the reward_model.py file works for me to avoid inf loss. I modified the line as loss += -torch.log( torch.sigmoid(c_truncated_reward - r_truncated_reward)+1e-7).mean()

how to use bf16 rather than fp16?

zhan0903 avatar May 09 '23 08:05 zhan0903

@LiinXemmon Hi, this is caused by log(0) which will return inf, I think you should a very small value to difference of two sentences' reward(like 1e-7), it will help you avoid inf loss in training process.

Hi Luoyang, I have added 1e-7 to the reward_model.py file under the utils/model folder while it still faces the inf loss issue. When using zero_stage = 3, the loss scale will drop to the minimum (1 here) and raise the error immediately after starting training. Changing the zero_stage = 0 will also constantly show the Grad Overflow problem though it can be trained.

I solved "Grad overflow" by using bf16 rather than the default fp16. Adding 1e-7 to the reward_model.py file works for me to avoid inf loss. I modified the line as loss += -torch.log( torch.sigmoid(c_truncated_reward - r_truncated_reward)+1e-7).mean()

how to use bf16 rather than fp16?

I changed fp16 to bf16 in this file: DeepSpeedExamples-master/applications/DeepSpeed-Chat/training/utils/ds_utils.py. Like this:

return {
        "train_batch_size": GLOBAL_BATCH_SIZE,
        "train_micro_batch_size_per_gpu": MICRO_BATCH_SIZE,
        "steps_per_print": 10,
        "zero_optimization": zero_opt_dict,
        "bf16": { # changed from fp16 to bf16
            "enabled": True,
            "loss_scale_window": 100
        },
        "gradient_clipping": 1.0,
        "prescale_gradients": False,
        "wall_clock_breakdown": False,
        "hybrid_engine": {
            "enabled": enable_hybrid_engine,
            "max_out_tokens": max_out_tokens,
            "inference_tp_size": inference_tp_size,
            "release_inference_cache": release_inference_cache,
            "pin_parameters": pin_parameters,
            "tp_gather_partition_size": tp_gather_partition_size,
        }

However, I am not sure whether this is THE right way to do it.

LiGhtime avatar May 17 '23 09:05 LiGhtime

@LiinXemmon Hi, this is caused by log(0) which will return inf, I think you should a very small value to difference of two sentences' reward(like 1e-7), it will help you avoid inf loss in training process.

Hi Luoyang, I have added 1e-7 to the reward_model.py file under the utils/model folder while it still faces the inf loss issue. When using zero_stage = 3, the loss scale will drop to the minimum (1 here) and raise the error immediately after starting training. Changing the zero_stage = 0 will also constantly show the Grad Overflow problem though it can be trained.

I solved "Grad overflow" by using bf16 rather than the default fp16. Adding 1e-7 to the reward_model.py file works for me to avoid inf loss. I modified the line as loss += -torch.log( torch.sigmoid(c_truncated_reward - r_truncated_reward)+1e-7).mean()

great job

scarydemon2 avatar Jun 09 '23 10:06 scarydemon2