Chinese-LLaMA-Alpaca ValueError: Attempting to unscale FP16 gradients.

详细描述问题

我希望在已有的Chinese-LLaMA-Plus-7B上对模型进行预训练。我先将原版LLaMA与chinese-llama-plus-lora-7b进行合并，得到了Chinese-LLaMA-Plus-7B，然后使用预训练脚本中的方式对模型进行预训练，我没有使用deepspeed，但最终运行得到了ValueError: Attempting to unscale FP16 gradients.的错误。 torch版本为1.12.0，transformers版本为4.28.1。

运行截图或log

2023-05-11_14-09

必查项目

[ ] 哪个模型的问题：LLaMA
[ ] 问题类型：
- 模型预训练

May 11 '23 06:05 klykq111

peft版本是多少？

May 11 '23 07:05 ymcui

0.3.0.dev0

May 11 '23 07:05 klykq111

设置的可训练参数是什么？模型以fp16加载？

May 11 '23 07:05 iMountTai

这是我的训练脚本：

#!/bin/bash
lr=2e-4
lora_rank=8
lora_trainable="q_proj,v_proj,k_proj,o_proj,gate_proj,down_proj,up_proj"
modules_to_save="embed_tokens,lm_head"
lora_dropout=0.1

pretrained_model="models/ziqingyang_chinese-llama-plus-7b"
chinese_tokenizer_path="models/ziqingyang_chinese-llama-plus-7b"
dataset_dir="data_clm"
data_cache="data_cache"
per_device_batch_size=1
seed=666
training_epochs=1
gradient_accumulation_steps=1
output_dir="llama_finetune"

CUDA_VISIBLE_DEVICES=1 python scripts/run_clm_pt_with_peft.py \
    --model_name_or_path ${pretrained_model} \
    --tokenizer_name_or_path ${chinese_tokenizer_path} \
    --dataset_dir ${dataset_dir} \
    --data_cache_dir ${data_cache} \
    --validation_split_percentage 0.001 \
    --per_device_train_batch_size ${per_device_batch_size} \
    --per_device_eval_batch_size ${per_device_batch_size} \
    --do_train \
    --seed ${seed} \
    --fp16 \
    --num_train_epochs ${training_epochs} \
    --lr_scheduler_type cosine \
    --learning_rate ${lr} \
    --warmup_ratio 0.05 \
    --weight_decay 0.01 \
    --logging_strategy steps \
    --logging_steps 10 \
    --save_strategy steps \
    --save_total_limit 3 \
    --save_steps 500 \
    --gradient_accumulation_steps ${gradient_accumulation_steps} \
    --preprocessing_num_workers 8 \
    --block_size 512 \
    --output_dir ${output_dir} \
    --overwrite_output_dir \
    --ddp_timeout 30000 \
    --logging_first_step True \
    --lora_rank ${lora_rank} \
    --trainable ${lora_trainable} \
    --modules_to_save ${modules_to_save} \
    --lora_dropout ${lora_dropout} \
    --torch_dtype float16

May 11 '23 07:05 klykq111

有按照脚本中提示的peft版本安装吗？

May 11 '23 07:05 iMountTai

这是我目前环境下所有的库，就是安装的脚本中提示的peft版本，安装过程也没有报错。 2023-05-11_15-55

May 11 '23 07:05 klykq111

把fp16关了吧

May 11 '23 08:05 iMountTai

我把"--fp16"给删掉之后，又出现了"RuntimeError: expected scalar type Half but found Float"的错误

May 11 '23 08:05 klykq111

我参考了一下： https://huggingface.co/CompVis/stable-diffusion-v1-4/discussions/10 https://github.com/d8ahazard/sd_dreambooth_extension/issues/37 尝试将https://github.com/ymcui/Chinese-LLaMA-Alpaca/blob/fb27d3ba607b0591610b874b580e8571859521f8/scripts/run_clm_pt_with_peft.py#L585 改为：

        with torch.autocast("cuda"):
            train_result = trainer.train(resume_from_checkpoint=checkpoint)

就能够正常训练了，但是loss打印只有第一个有值，其余都是0： 2023-05-11_17-03

May 11 '23 09:05 klykq111

我把"--fp16"给删掉之后，又出现了"RuntimeError: expected scalar type Half but found Float"的错误

也可以在https://github.com/ymcui/Chinese-LLaMA-Alpaca/blob/fb27d3ba607b0591610b874b580e8571859521f8/scripts/run_clm_pt_with_peft.py#L557 后面一行加上 model.half() 也能够正常训练，但是loss问题依然没有解决

May 11 '23 11:05 klykq111

loss问题，尝试将https://github.com/ymcui/Chinese-LLaMA-Alpaca/blob/fb27d3ba607b0591610b874b580e8571859521f8/scripts/run_clm_pt_with_peft.py#L556 这个入参给删掉，也就是"embed_tokens,lm_head"不训练了，loss就正常了。但是训练参数由6.215%降低到了0.2895%。

May 11 '23 11:05 klykq111

看你安装了bitsandbytes依赖，自适配了int8？

May 11 '23 12:05 iMountTai

peft前段时间的版本中modules_to_save功能不太稳定，所以最好就stick to https://github.com/huggingface/peft/tree/13e53fc 这个版本。

May 11 '23 12:05 airaria

看你安装了bitsandbytes依赖，自适配了int8？

没有，我只是在出现了"ValueError: Attempting to unscale FP16 gradients."这个问题之后，尝试过load_in_8bit，看看能不能解决这个问题，所以有这个库。我实测下来，load_in_8bit并且把--fp16删了，的确可以解决"ValueError: Attempting to unscale FP16 gradients."和"RuntimeError: expected scalar type Half but found Float"问题，但是loss问题还是得删了modules_to_save才行。

May 12 '23 01:05 klykq111

看你安装了bitsandbytes依赖，自适配了int8？

没有，我只是在出现了"ValueError: Attempting to unscale FP16 gradients."这个问题之后，尝试过load_in_8bit，看看能不能解决这个问题，所以有这个库。我实测下来，load_in_8bit并且把--fp16删了，的确可以解决"ValueError: Attempting to unscale FP16 gradients."和"RuntimeError: expected scalar type Half but found Float"问题，但是loss问题还是得删了modules_to_save才行。

目前的脚本经过适配后，不适合去掉deepspeed训练，因此pull最新仓库后还请按照相应的脚本设置运行代码，修改训练参数设置，如fp16,deepspeed等，不保证能正常训练

May 13 '23 09:05 iMountTai

看你安装了bitsandbytes依赖，自适配了int8？

没有，我只是在出现了"ValueError: Attempting to unscale FP16 gradients."这个问题之后，尝试过load_in_8bit，看看能不能解决这个问题，所以有这个库。我实测下来，load_in_8bit并且把--fp16删了，的确可以解决"ValueError: Attempting to unscale FP16 gradients."和"RuntimeError: expected scalar type Half but found Float"问题，但是loss问题还是得删了modules_to_save才行。

目前的脚本经过适配后，不适合去掉deepspeed训练，因此pull最新仓库后还请按照相应的脚本设置运行代码，修改训练参数设置，如fp16,deepspeed等，不保证能正常训练

实测，把deepspeed加入训练之后，目前一切正常。非常感谢！

May 15 '23 06:05 klykq111

Hi, I am facing the same issue. I am getting an error : ValueError: Attempting to unscale FP16 gradients.

If I turnoff fp16, I am running out of memory and training is not starting.

My startup command:

python src/models/run_clm_pt_with_peft.py
--deepspeed /home/git_repos/thesis/src/models/ds_zero2_no_offload.json
--model_name_or_path /home/llama-hf
--tokenizer_name_or_path /home/llama-hf
--dataset_dir /home/git_repos/thesis/datasets/mlm_data/
--data_cache_dir temp_data_cache_dir
--validation_split_percentage 0.001
--per_device_train_batch_size 1
--per_device_eval_batch_size 1
--do_train
--seed 25
--fp16
--num_train_epochs 1
--lr_scheduler_type cosine
--learning_rate 2e-4
--warmup_ratio 0.05
--weight_decay 0.01
--logging_strategy steps
--logging_steps 10
--save_strategy steps
--save_total_limit 3
--save_steps 200
--gradient_accumulation_steps 8
--preprocessing_num_workers 8
--block_size 512
--output_dir output_dir
--overwrite_output_dir
--ddp_timeout 30000
--logging_first_step True
--lora_rank 8
--lora_alpha 32
--trainable "q_proj,v_proj,k_proj,o_proj,gate_proj,down_proj,up_proj"
--modules_to_save "embed_tokens,lm_head"
--lora_dropout 0.05
--torch_dtype float16
--ddp_find_unused_parameters False

Please provide suggestions as to how you resolved the issue.

Update @klykq111 After adding autocast thing you mentioned, my training started, but the loss is at 0.0 :( where should I add mode.half() line in the run_clm_with_peft file?

Jun 28 '23 18:06 lathashree01

Hello, adding the line model.half() to the run_clm_with_peft file did not solve the issue of loss being 0. Later, I used the latest code and ensured training with deepspeed, and everything worked fine. I think you should re-pull the latest code and make sure the training command is consistent with the instructions in the wiki.

Jun 29 '23 06:06 klykq111

Thank you. It's training fine now; No errors and I see different loss values at least :)

I was first trying to run the Python file independently to verify whether it ran fine. After that, I thought I would run it as a job using the final script. But I think my issue was because of deep speed and something to do with bitsandbytes installation.

I created a new environment, installed everything from the beginning. I tried running with the original script. I faced some Cuda issues but finally resolved them, and now things look fine.

Jun 29 '23 21:06 lathashree01

因为 amp 要求可训练参数是torch.float32类型。lora模块的参数是torch.float32类型，但是modules_to_save='embed_tokens,lm_head'中的参数在from_pretrained时初始化为torch.float16，又同时参与amp更新梯度，所以会报错。解决方案： 1. 对于llama模型可以手动转换embed_tokens和lm_head层为torch.float32 2. 对于任意模型，可以遍历参数，把requires_grad的参数全都手动设为torch.float32

model.print_trainable_parameters()
# monkey patch
logger.info(f"model.modules_to_save: {model.modules_to_save}")
trainable_not_float32 = [name for name, param in model.named_parameters() if param.requires_grad and param.dtype != torch.float]
if len(trainable_not_float32) > 0:
    logger.warning(f"{trainable_not_float32} trainable but not float32.")
    # for llama case
    model.base_model.model.model.embed_tokens = model.base_model.model.model.embed_tokens.float()
    model.base_model.model.lm_head = model.base_model.model.lm_head.float()
    # for common case
    # for param in model.parameters():
    #     if param.requires_grad and param.dtype != torch.float32:
    #         param = param.float()

Jul 26 '23 00:07 crj1998

因为 amp 要求可训练参数是torch.float32类型。lora模块的参数是torch.float32类型，但是modules_to_save='embed_tokens,lm_head'中的参数在from_pretrained时初始化为torch.float16，又同时参与amp更新梯度，所以会报错。解决方案： 1. 对于llama模型可以手动转换embed_tokens和lm_head层为torch.float32 2. 对于任意模型，可以遍历参数，把requires_grad的参数全都手动设为torch.float32
model.print_trainable_parameters()
# monkey patch
logger.info(f"model.modules_to_save: {model.modules_to_save}")
trainable_not_float32 = [name for name, param in model.named_parameters() if param.requires_grad and param.dtype != torch.float]
if len(trainable_not_float32) > 0:
    logger.warning(f"{trainable_not_float32} trainable but not float32.")
    # for llama case
    model.base_model.model.model.embed_tokens = model.base_model.model.model.embed_tokens.float()
    model.base_model.model.lm_head = model.base_model.model.lm_head.float()
    # for common case
    # for param in model.parameters():
    #     if param.requires_grad and param.dtype != torch.float32:
    #         param = param.float()

大佬，请问在代码中进行这样的修改的话，能够pass fp16给trainer嘛？ @crj1998

Dec 11 '23 08:12 ZeyuTeng96

以我的情况，peft 0.7.0版本会出现这个问题，我将peft降级到0.4.0就不再出现这个问题了。 In my case, 'peft' version 0.7.0 'had this problem, and when I downgraded' peft 'to' 0.4.0 ', the issue was resolved.

Dec 31 '23 03:12 JohnHwangzn

以我的情况，peft 0.7.0版本会出现这个问题，我将peft降级到0.4.0就不再出现这个问题了。 In my case, 'peft' version 0.7.0 'had this problem, and when I downgraded' peft 'to' 0.4.0 ', the issue was resolved.

nice！按照这样处理就好了，谢谢兄弟！

Jan 06 '24 13:01 ZHANGJINKUI

看你安装了bitsandbytes依赖，自适配了int8？

没有，我只是在出现了"ValueError: Attempting to unscale FP16 gradients."这个问题之后，尝试过load_in_8bit，看看能不能解决这个问题，所以有这个库。我实测下来，load_in_8bit并且把--fp16删了，的确可以解决"ValueError: Attempting to unscale FP16 gradients."和"RuntimeError: expected scalar type Half but found Float"问题，但是loss问题还是得删了modules_to_save才行。

目前的脚本经过适配后，不适合去掉deepspeed训练，因此pull最新仓库后还请按照相应的脚本设置运行代码，修改训练参数设置，如fp16,deepspeed等，不保证能正常训练

实测，把deepspeed加入训练之后，目前一切正常。非常感谢！

请问，为什么我用win10系统 cuda11.8 torch2.0.1 deepdpeed fp16还是会报错：ValueError: Attempting to unscale FP16 gradients

Jan 22 '24 09:01 Suiji12

Hello, adding the line model.half() to the run_clm_with_peft file did not solve the issue of loss being 0. Later, I used the latest code and ensured training with deepspeed, and everything worked fine. I think you should re-pull the latest code and make sure the training command is consistent with the instructions in the wiki.

我加入fp16后loss之后第一步是正常的以后都是0，请问这是为什么呀？--model_name_or_path D:/jjx/moxing/llama-2-7b-chat-hf --tokenizer_name_or_path D:/jjx/moxing/llama-2-7b-chat-hf --dataset_dir D:/jjx/Retrieve-Rewrite-Answer-main/Retrieve-Rewrite-Answer-main/finetune-llama/MetaQA/train --deepspeed D:/jjx/Retrieve-Rewrite-Answer-main/Retrieve-Rewrite-Answer-main/finetune-llama/ds_zero2_no_offload.json --validation_split_percentage 0.001 --per_device_train_batch_size 4 --per_device_eval_batch_size 4 --do_train --do_eval --seed 126 --num_train_epochs 10 --lr_scheduler_type cosine --learning_rate 1e-4 --warmup_ratio 0.03 --weight_decay 0 --logging_strategy steps --logging_steps 10 --save_strategy epoch --save_total_limit 2 --evaluation_strategy epoch --gradient_accumulation_steps 8 --preprocessing_num_workers 8 --max_seq_length 1024 --output_dir D:/jjx/Retrieve-Rewrite-Answer-main/Retrieve-Rewrite-Answer-main/finetune-llama/lora7b --overwrite_output_dir --ddp_timeout 30000 --logging_first_step True --lora_rank 64 --lora_alpha 128 --trainable "q_proj,v_proj,k_proj,o_proj,gate_proj,down_proj,up_proj" --modules_to_save "embed_tokens,lm_head" --lora_dropout 0.05 --torch_dtype float16 --validation_file D:/jjx/Retrieve-Rewrite-Answer-main/Retrieve-Rewrite-Answer-main/finetune-llama/MetaQA/dev.json --gradient_checkpointing --ddp_find_unused_parameters False --load_best_model_at_end True 这是我的参数

Jan 22 '24 09:01 Suiji12

看你安装了bitsandbytes依赖，自适配了int8？

没有，我只是在出现了"ValueError: Attempting to unscale FP16 gradients."这个问题之后，尝试过load_in_8bit，看看能不能解决这个问题，所以有这个库。我实测下来，load_in_8bit并且把--fp16删了，的确可以解决"ValueError: Attempting to unscale FP16 gradients."和"RuntimeError: expected scalar type Half but found Float"问题，但是loss问题还是得删了modules_to_save才行。

请问你说的把modules_to_save这个参数删掉指的是删除所有出现在这个py文件中的modules_to_save吗？

Jan 22 '24 10:01 Suiji12

看你安装了bitsandbytes依赖，自适配了int8？

没有，我只是在出现了"ValueError: Attempting to unscale FP16 gradients."这个问题之后，尝试过load_in_8bit，看看能不能解决这个问题，所以有这个库。我实测下来，load_in_8bit并且把--fp16删了，的确可以解决"ValueError: Attempting to unscale FP16 gradients."和"RuntimeError: expected scalar type Half but found Float"问题，但是loss问题还是得删了modules_to_save才行。

目前的脚本经过适配后，不适合去掉deepspeed训练，因此pull最新仓库后还请按照相应的脚本设置运行代码，修改训练参数设置，如fp16,deepspeed等，不保证能正常训练

实测，把deepspeed加入训练之后，目前一切正常。非常感谢！

方便问一下您的deepspeed库的版本吗

Jan 23 '24 01:01 Suiji12

看你安装了bitsandbytes依赖，自适配了int8？

没有，我只是在出现了“ValueError： Trying to unscale FP16 gradients.”这个问题之后，尝试过load_in_8bit，看看能不能解决这个问题，所以有这个库。我实测下来，load_in_8bit并且把--fp16删了，的确可以解决“ValueError： Trying to unscale FP16 gradients.”和“RuntimeError： expected scalar type Half but found Float”问题，但是loss问题还是得删了modules_to_save才行。

目前的脚本经过适配后，不适合去掉deepspeed训练，因此pull最新仓库后还请按照相应的脚本设置运行代码，修改训练参数设置，如fp16，deepspeed等，不保证能正常训练

实测，把deepspeed加入训练之后，目前一切正常。非常感谢！

方便问一下您的deepspeed库的版本吗

请问你解决了吗？我不是很理解如何把deepspeed训练参数加入进去

Jun 19 '24 02:06 UncleshoesXR

Chinese-LLaMA-Alpaca Chinese-LLaMA-Alpaca copied to clipboard

ValueError: Attempting to unscale FP16 gradients.

详细描述问题

运行截图或log

必查项目

Chinese-LLaMA-Alpaca
Chinese-LLaMA-Alpaca copied to clipboard