Qwen-VL
Qwen-VL copied to clipboard
[BUG] ValueError: Cannot merge LORA layers when the model is gptq quantized | When merging a LORA-finetuned Qwen-VL-Chat-Int4
是否已有关于该错误的issue或讨论? | Is there an existing issue / discussion for this?
- [X] 我已经搜索过已有的issues和讨论 | I have searched the existing issues / discussions
该问题是否在FAQ中有解答? | Is there an existing answer for this in FAQ?
- [X] 我已经搜索过FAQ | I have searched FAQ
当前行为 | Current Behavior
I finetuned the "Qwen-VL-Chat-Int4" using lora, by using this command:
#!/bin/bash
!export CUDA_DEVICE_MAX_CONNECTIONS=1
!DIR=`pwd`
MODEL= "Qwen/Qwen-VL-Chat-Int4" # See the section for finetuning in README for more information.
DATA="/content/training_dataset.json"
!export CUDA_VISIBLE_DEVICES=0
!python finetune.py \
--model_name_or_path $MODEL \
--data_path $DATA \
--bf16 True \
--fix_vit True \
--output_dir output_qwen_v3 \
--num_train_epochs 5 \
--per_device_train_batch_size 1 \
--per_device_eval_batch_size 1 \
--gradient_accumulation_steps 8 \
--evaluation_strategy "no" \
--save_strategy "steps" \
--save_steps 1000 \
--save_total_limit 10 \
--learning_rate 1e-5 \
--weight_decay 0.1 \
--adam_beta2 0.95 \
--warmup_ratio 0.01 \
--lr_scheduler_type "cosine" \
--logging_steps 1 \
--report_to "none" \
--model_max_length 2048 \
--lazy_preprocess True \
--gradient_checkpointing \
--use_lora
And when trying to merge the model using the following code:
from peft import AutoPeftModelForCausalLM
model = AutoPeftModelForCausalLM.from_pretrained(
"output_qwen_v3 ",
device_map="auto",
trust_remote_code=True
).eval()
merged_model = model.merge_and_unload()
# max_shard_size and safe serialization are not necessary.
# They respectively work for sharding checkpoint and save the model to safetensors
merged_model.save_pretrained(new_model_directory, max_shard_size="2048MB", safe_serialization=True)
The following error arises:
/usr/local/lib/python3.10/dist-packages/peft/tuners/lora/model.py in _unload_and_optionally_merge(self, merge, progressbar, safe_merge, adapter_names) 425 if merge: 426 if getattr(self.model, "quantization_method", None) == "gptq": --> 427 raise ValueError("Cannot merge LORA layers when the model is gptq quantized") 428 429 self._unloading_checks(adapter_names)
ValueError: Cannot merge LORA layers when the model is gptq quantized
期望行为 | Expected Behavior
As described at the end of the finetuning section this code will merge the LORA adapter with the pre-trained model in a standalone model.
复现方法 | Steps To Reproduce
No response
运行环境 | Environment
- OS: Linux (Colab)
- peft==0.7
备注 | Anything else?
I need to merge the model to reduce the latency during inference. Currently, loading the base model and the LoRA model separately during inference is causing latency.
BTW, the latency issue described here Huggingface: merge-lora-weights-into-the-base-model
@AHMAD-DOMA could you please help me out how to finetune the INT4 model. I tried but failed. [email protected] please share me the script if possible
Have you solved this problem yet?
Have you solved this problem yet?
Have you solved this problem yet?