qlora merging help needed.

Open gyupro opened this issue 10 months ago • 0 comments

Reminder

[X] I have read the README and searched the existing issues.

Reproduction

sh examples/merge_lora/merge.sh

Expected behavior

In the shell script, I followed the provided example about how qlora is trained. I trained gptq model with lora, and I tried to inference with vllm backend engine, it says

DO NOT use quantized model or quantization_bit when merging lora weights

How am I supposed to serve training gptq model with lora with vllm backend?

System Info

transformers version: 4.41.0.dev0
Platform: Linux-5.4.0-171-generic-x86_64-with-glibc2.31
Python version: 3.10.3
Huggingface_hub version: 0.19.4
Safetensors version: 0.4.2
Accelerate version: 0.27.2
Accelerate config: - compute_environment: LOCAL_MACHINE - distributed_type: DEEPSPEED - use_cpu: False - debug: False - num_processes: 2 - machine_rank: 0 - num_machines: 1 - rdzv_backend: static - same_network: True - main_training_function: main - deepspeed_config: {'deepspeed_config_file': 'ds_config.json', 'zero3_init_flag': False} - downcast_bf16: no - tpu_use_cluster: False - tpu_use_sudo: False - tpu_env: [] - dynamo_config: {'dynamo_backend': 'EAGER'}
PyTorch version (GPU?): 2.2.1 (True)
Tensorflow version (GPU?): not installed (NA)
Flax version (CPU?/GPU?/TPU?): not installed (NA)
Jax version: not installed
JaxLib version: not installed
Using GPU in script?:
Using distributed or parallel set-up in script?:

Others

No response

Apr 24 '24 07:04 gyupro

LLaMA-Factory LLaMA-Factory copied to clipboard

qlora merging help needed.

Reminder

Reproduction

Expected behavior

DO NOT use quantized model or quantization_bit when merging lora weights

System Info

Others

LLaMA-Factory
LLaMA-Factory copied to clipboard