LLaMA-Factory icon indicating copy to clipboard operation
LLaMA-Factory copied to clipboard

qlora merging help needed.

Open gyupro opened this issue 10 months ago • 0 comments

Reminder

  • [X] I have read the README and searched the existing issues.

Reproduction

sh examples/merge_lora/merge.sh

Expected behavior

In the shell script, I followed the provided example about how qlora is trained. I trained gptq model with lora, and I tried to inference with vllm backend engine, it says

DO NOT use quantized model or quantization_bit when merging lora weights

.

How am I supposed to serve training gptq model with lora with vllm backend?

System Info

  • transformers version: 4.41.0.dev0
  • Platform: Linux-5.4.0-171-generic-x86_64-with-glibc2.31
  • Python version: 3.10.3
  • Huggingface_hub version: 0.19.4
  • Safetensors version: 0.4.2
  • Accelerate version: 0.27.2
  • Accelerate config: - compute_environment: LOCAL_MACHINE - distributed_type: DEEPSPEED - use_cpu: False - debug: False - num_processes: 2 - machine_rank: 0 - num_machines: 1 - rdzv_backend: static - same_network: True - main_training_function: main - deepspeed_config: {'deepspeed_config_file': 'ds_config.json', 'zero3_init_flag': False} - downcast_bf16: no - tpu_use_cluster: False - tpu_use_sudo: False - tpu_env: [] - dynamo_config: {'dynamo_backend': 'EAGER'}
  • PyTorch version (GPU?): 2.2.1 (True)
  • Tensorflow version (GPU?): not installed (NA)
  • Flax version (CPU?/GPU?/TPU?): not installed (NA)
  • Jax version: not installed
  • JaxLib version: not installed
  • Using GPU in script?:
  • Using distributed or parallel set-up in script?:

Others

No response

gyupro avatar Apr 24 '24 07:04 gyupro