peft
peft copied to clipboard
modules_to_save: "ValueError: Attempting to unscale FP16 gradients"
I'm trying to finetune llama with some expanded tokens using resize_token_embeddings()
and passing modules_to_save=['embed_tokens', 'lm_head']
, but it seems there is some misconfiguration
Traceback (most recent call last):
File "/home/jonathanasdf/.local/lib/python3.10/site-packages/transformers/trainer.py", line 1662, in train
return inner_training_loop(
File "/home/jonathanasdf/.local/lib/python3.10/site-packages/transformers/trainer.py", line 1962, in _inner_training_loop
self.scaler.unscale_(self.optimizer)
File "/home/jonathanasdf/.local/lib/python3.10/site-packages/torch/cuda/amp/grad_scaler.py", line 284, in unscale_
optimizer_state["found_inf_per_device"] = self._unscale_grads_(optimizer, inv_scale, found_inf, False)
File "/home/jonathanasdf/.local/lib/python3.10/site-packages/torch/cuda/amp/grad_scaler.py", line 212, in _unscale_grads_
raise ValueError("Attempting to unscale FP16 gradients.")
So I think the problem above is some incompatibility between int8 and modules_to_save. Using float16 instead of int8 is fine.
But actually, it seems that after https://github.com/huggingface/peft/commit/c21afbe868734c0af8bd4577c4c7acdf366b96d1 setting modules_to_save for LoraConfig CAUSAL_LM doesn't do anything anymore. I had to revert to before that commit for the setting to make the modules trainable and saved in the checkpoint.
yes, for now, if use a lower version of peft, it will report the same error as you. But if upgrade peft to latest version(0.3.0.dev0 for now), I can't save my lora weights for the weight size will became 443 bytes unexpectly. #https://github.com/oobabooga/text-generation-webui/issues/1270#issue-1669857486
I am having the same problem here and not solved yet even I installed peft by the up-to-date repo. Just like the author of this issue, I'm trying to finetune llama with some expanded tokens using resize_token_embeddings() and passing modules_to_save=['embed_tokens', 'lm_head']. Here is what I am suffering:
- The problem solved when I delete 'modules_to_save' in LoraConfig.
- The problem solved if I transform the model into torch.float32 and set fp16=False in the parameters of the trainer.
- Neither swithching load_in_8bit=False or True or switching low_cpu_mem_usage=True/False when load base llama model helps.
- When I delete 'modules_to_save' in LoraConfig, the peftmodel object would have both torch.float32 and torch.float16 parameters, but this weird fact does not lead to error. However, if I set modules_to_save=['embed_tokens', 'lm_head'], even I use .half to transform all parameters of the peftmodel into torch.float16, with Fp16=true in the trainer, the "ValueError: Attempting to unscale FP16 gradients" still happens. @pacman100 Please review this issue, thank u~
My environment: torch==1.13.1 transformers== 4.29.0.dev0 peft = 0.3.0 dev0 CUDA == 11.7 GPU is a Tesla A100 80G.
New idea: Now the training finally works. Setting fp16=False would make the training be super slow and not mem-friendly.
To avoid "ValueError: Attempting to unscale FP16 gradients", just make sure each trainable params to be in type 'torch.float32'. In my case just:
model.base_model.model.model.embed_tokens.weight.data = model.base_model.model.model.embed_tokens.weight.data.float()
model.base_model.model.lm_head.weight.data = model.base_model.model.lm_head.weight.data.float()
It seems like a bug from pytorch side.
New idea: Now the training finally works. Setting fp16=False would make the training be super slow and not mem-friendly.
To avoid "ValueError: Attempting to unscale FP16 gradients", just make sure each trainable params to be in type 'torch.float32'. In my case just:
model.base_model.model.model.embed_tokens.weight.data = model.base_model.model.model.embed_tokens.weight.data.float() model.base_model.model.lm_head.weight.data = model.base_model.model.lm_head.weight.data.float()
It seems like a bug from pytorch side.
so clever, dude. thanks for your idea
I am having the same problem here and not solved yet even I installed peft by the up-to-date repo. Just like the author of this issue, I'm trying to finetune llama with some expanded tokens using resize_token_embeddings() and passing modules_to_save=['embed_tokens', 'lm_head']. Here is what I am suffering:
- The problem solved when I delete 'modules_to_save' in LoraConfig.
- The problem solved if I transform the model into torch.float32 and set fp16=False in the parameters of the trainer.
- Neither swithching load_in_8bit=False or True or switching low_cpu_mem_usage=True/False when load base llama model helps.
- When I delete 'modules_to_save' in LoraConfig, the peftmodel object would have both torch.float32 and torch.float16 parameters, but this weird fact does not lead to error. However, if I set modules_to_save=['embed_tokens', 'lm_head'], even I use .half to transform all parameters of the peftmodel into torch.float16, with Fp16=true in the trainer, the "ValueError: Attempting to unscale FP16 gradients" still happens. @pacman100 Please review this issue, thank u~
My environment: torch==1.13.1 transformers== 4.29.0.dev0 peft = 0.3.0 dev0 CUDA == 11.7 GPU is a Tesla A100 80G.
I have a same question.
New idea: Now the training finally works. Setting fp16=False would make the training be super slow and not mem-friendly.
To avoid "ValueError: Attempting to unscale FP16 gradients", just make sure each trainable params to be in type 'torch.float32'. In my case just:
model.base_model.model.model.embed_tokens.weight.data = model.base_model.model.model.embed_tokens.weight.data.float() model.base_model.model.lm_head.weight.data = model.base_model.model.lm_head.weight.data.float()
It seems like a bug from pytorch side.
When I try this, I get the following error
model.base_model.model.model.embed_tokens.weight.data.float() File "/project/msi290_uksr/generative_tod/myenv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1614, in __getattr__ raise AttributeError("'{}' object has no attribute '{}'".format( AttributeError: 'GPTJForCausalLM' object has no attribute 'model'
Could you show an example of where you actually wrote those two lines.
New idea: Now the training finally works. Setting fp16=False would make the training be super slow and not mem-friendly. To avoid "ValueError: Attempting to unscale FP16 gradients", just make sure each trainable params to be in type 'torch.float32'. In my case just:
model.base_model.model.model.embed_tokens.weight.data = model.base_model.model.model.embed_tokens.weight.data.float() model.base_model.model.lm_head.weight.data = model.base_model.model.lm_head.weight.data.float()
It seems like a bug from pytorch side.
When I try this, I get the following error
model.base_model.model.model.embed_tokens.weight.data.float() File "/project/msi290_uksr/generative_tod/myenv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1614, in __getattr__ raise AttributeError("'{}' object has no attribute '{}'".format( AttributeError: 'GPTJForCausalLM' object has no attribute 'model'
Could you show an example of where you actually wrote those two lines.
Right after I called PeftModel.from_pretrained, which combines lora weight and original model's params.
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
I didn't use llama, but I got the same error when fine-tuning BLIP with LoRA. I checked the dtype of the parameters in all the layers and they were all float16. My solution was to change the bias= option in Lora__config from "lora_only" to "none".
config = LoraConfig( r=16, lora_alpha=32, lora_dropout=0.05, bias="none", #<--- replace "lora_only" target_modules=["query", "key", "value", "qkv", "text_decoder.cls.predictions.decoder"], )
Any updates on this issue? Still seeing this bug
Any updates on this issue? Still seeing this bug
Which one exactly do you mean? Note that for the case of loading the model in float16, you have to follow the advice given above.
A snippet that should work a little bit more generally:
for param in model.parameters():
if param.requires_grad:
param.data = param.data.to(torch.float32)
Any updates on this issue? Still seeing this bug
Which one exactly do you mean? Note that for the case of loading the model in float16, you have to follow the advice given above.
A snippet that should work a little bit more generally:
for param in model.parameters(): if param.requires_grad: param.data = param.data.to(torch.float32)
This is my use case test:
Break with raise ValueError("Attempting to unscale FP16 gradients.")
under below configs.
model = AutoModelForCausalLM.from_pretrained(
...
torch_dtype=torch.float16,
)
training_args = TrainingArguments(
fp16=True,
...
)
peft_config = LoraConfig(
...
modules_to_save=[embed_tokens, lm_head],
)
No error for below cases
model = AutoModelForCausalLM.from_pretrained(
...
torch_dtype=torch.float16,
)
training_args = TrainingArguments(
fp16=True,
...
)
peft_config = LoraConfig(
...
modules_to_save=None,
)
model = AutoModelForCausalLM.from_pretrained(
...
torch_dtype=torch.float32,
)
training_args = TrainingArguments(
fp16=True,
...
)
peft_config = LoraConfig(
...
modules_to_save=[embed_tokens, lm_head],
)
model = AutoModelForCausalLM.from_pretrained(
...
torch_dtype=torch.float16,
)
training_args = TrainingArguments(
fp16=False,
...
)
peft_config = LoraConfig(
...
modules_to_save=[embed_tokens, lm_head],
)
I am confused about how to understand the relation between, torch_dtype, fp16, modules_to_save?
I am confused about how to understand the relation between, torch_dtype, fp16, modules_to_save?
In general, when you want to use mixed precision (i.e. fp16=True
), the weights to be trained should not be loaded as float16. This is always true and has nothing to do with PEFT. When you add modules_to_save
with PEFT, it means that new layers are added to the model that will be trained. These layers are copies of the layers they are supposed to replace, so they use fp16 too. Therefore, you get the error. Does this clarify why you see these results?
I will add an entry to our docs via a PR #1336 that should hopefully make it easier in the future to debug this issue.
Thanks so much for clarify! I now understand that if load model with float16, the copied layer will cause error during training.
Just one dummy follow up. When modules_to_save is None, the original model is loaded in float16. What is the default weights type for lora layers? (fp16=True should represent computation type of float16 if I am understanding correctly)
What is the default weights type for lora layers?
In general, the LoRA layers use the same dtype for their parameters as the original layers, but there can be exceptions (e.g. when using prepare_model_for_kbit_training
).
Previously I tested that load model with float16, enable fp16 and modules_to_save equals to none. If lora layers inherit float16 from model type, I should expect to see similar error message? But actually it successfully finished training.
Previously I tested that load model with float16, enable fp16 and modules_to_save equals to none. If lora layers inherit float16 from model type, I should expect to see similar error message? But actually it successfully finished training.
Interesting. Not sure what exactly happened there, but here is a test that shows that trying to load a model in fp16 and training it with AMP results in the error:
https://github.com/huggingface/peft/pull/1336/files#diff-037b6550b3d063119b60801a751c3c5c97eab8e912ae8370c2dd5569d38305adR1226
I’ll give demo tomorrow.
I have a very simple script https://github.com/hengjiUSTC/learn-llm/blob/demo_crash/trl_finetune.py run with python3 trl_finetune.py --config configs/demo-crash.yml
In the configuration:
- fp16 is true: https://github.com/hengjiUSTC/learn-llm/blob/demo_crash/configs/demo_crash.yml#L29
- model is loaded in float16 https://github.com/hengjiUSTC/learn-llm/blob/demo_crash/configs/demo_crash.yml#L6C18-L6C18
Run1:
When commenting out line https://github.com/hengjiUSTC/learn-llm/blob/demo_crash/trl_finetune.py#L388 (modules_to_save is None) training process runs correctly:
Run2: When add back line https://github.com/hengjiUSTC/learn-llm/blob/demo_crash/trl_finetune.py#L388 (modules_to_save=["embed_tokens", "lm_head"]) error is raised.
So Run1 shouldn't work if LoRA weights inherit float16 from model type?
Thanks for providing an example. I tried it (using opt) and it crashed even with modules_to_save=None
. Checking the dtypes of the learnable parameters, they are fp16, so the crash is expected. Not sure what the source of the difference is, but either way, I think it's safe to say that when loading in fp16, it's best to cast the trainable weights to fp32. PR #1318 will introduce a convenience function cast_non_trainable_to_dtype
to do this quickly.
Got it, thanks for checking.
Thanks for providing an example. I tried it (using opt) and it crashed even with
modules_to_save=None
. Checking the dtypes of the learnable parameters, they are fp16, so the crash is expected. Not sure what the source of the difference is, but either way, I think it's safe to say that when loading in fp16, it's best to cast the trainable weights to fp32. PR #1318 will introduce a convenience functioncast_non_trainable_to_dtype
to do this quickly.