unsloth icon indicating copy to clipboard operation
unsloth copied to clipboard

unsloth with vllm in 8/4 bits

Open quancore opened this issue 5 months ago • 7 comments

I have trained qlora model with unsloth and I want to serve with vllm but I did not found a way to serve model in8/4 bits ?

quancore avatar Mar 16 '24 22:03 quancore

@quancore I'm not sure / unsure if vLLM allows serving in 4 or 8 bits! 16bit yes, but unsure on 4 or 8

danielhanchen avatar Mar 17 '24 02:03 danielhanchen

@danielhanchen I think it is: https://github.com/vllm-project/vllm/issues/1155

quancore avatar Mar 17 '24 13:03 quancore

@danielhanchen I think it is: vllm-project/vllm#1155

Looks like they only support AWQ quantization not via bitsandbytes.

patleeman avatar Mar 19 '24 12:03 patleeman

@patleeman Oh ye AWQ is great - I'm assuming you want to quantize it to AWQ?

danielhanchen avatar Mar 20 '24 04:03 danielhanchen

@patleeman @danielhanchen well yes, maybe we should support AWQ so we can use qlora models with vllm?

quancore avatar Mar 20 '24 17:03 quancore

Hello there. I am also interested in using with VLLM a 8/4 bits model trained with Unsloth. Currently, it works fine with 16 bits but requires too much VRAM. Is there a way to quantize a model trained with Unsloth using AWQ or GPTQ?

marcelodiaz558 avatar Apr 06 '24 03:04 marcelodiaz558