LLaMA-Factory
LLaMA-Factory copied to clipboard
qlora merging help needed.
Reminder
- [X] I have read the README and searched the existing issues.
Reproduction
sh examples/merge_lora/merge.sh
Expected behavior
In the shell script, I followed the provided example about how qlora is trained. I trained gptq model with lora, and I tried to inference with vllm backend engine, it says
DO NOT use quantized model or quantization_bit when merging lora weights
.
How am I supposed to serve training gptq model with lora with vllm backend?
System Info
-
transformers
version: 4.41.0.dev0 - Platform: Linux-5.4.0-171-generic-x86_64-with-glibc2.31
- Python version: 3.10.3
- Huggingface_hub version: 0.19.4
- Safetensors version: 0.4.2
- Accelerate version: 0.27.2
- Accelerate config: - compute_environment: LOCAL_MACHINE - distributed_type: DEEPSPEED - use_cpu: False - debug: False - num_processes: 2 - machine_rank: 0 - num_machines: 1 - rdzv_backend: static - same_network: True - main_training_function: main - deepspeed_config: {'deepspeed_config_file': 'ds_config.json', 'zero3_init_flag': False} - downcast_bf16: no - tpu_use_cluster: False - tpu_use_sudo: False - tpu_env: [] - dynamo_config: {'dynamo_backend': 'EAGER'}
- PyTorch version (GPU?): 2.2.1 (True)
- Tensorflow version (GPU?): not installed (NA)
- Flax version (CPU?/GPU?/TPU?): not installed (NA)
- Jax version: not installed
- JaxLib version: not installed
- Using GPU in script?:
- Using distributed or parallel set-up in script?:
Others
No response