vllm [Feature]: bitsandbytes support

🚀 The feature, motivation and pitch

Bitsandbytes 4bit quantization support. I know many want that, and also it is discuused before and marked as unplaned, but after I looked how TGI implemented that https://github.com/huggingface/text-generation-inference/blob/main/server/text_generation_server/utils/layers.py#L285 And TGI is based on VLLM ofc.

Alternatives

I know that GPTQ is better quan. compared to b&b 4b, but B&B is great for QLORA merged peft models, while it is almost impossible to gptq/awq quan. a b&b 4bit model (and I am not even talking about nf4 vs fp4 perpelxity case) as they are not offically supporting that (even though others sometimes successfully quantize from merged b&b qlora model to gptq or awq, but I don't for example)

Additional context

As I mentioned above, https://github.com/huggingface/text-generation-inference/blob/main/server/text_generation_server/utils/layers.py#L285 It looks very simple implementation of the Linear4bit class for b&b, I could add a pr myself to vllm, I just wondered why it is not added, maybe something I miss?

Apr 12 '24 14:04 orellavie1212

BNB 4-bit is a very useful feature. Many models don't have GPTQ or AWQ quantization versions, and it requires some hard work to quantize a large model using post-training methods.

Everyone know post-trianing quantization get better performance , but many guys like me doesn't care about the little performance loss when we try the demo product.

Apr 19 '24 17:04 EvilPsyCHo

After the release of Llama3, I only can play the 8B version with vLLM, and I have to switch to Ollama to run the 70B version.

Apr 19 '24 17:04 EvilPsyCHo

want +1

Apr 23 '24 03:04 oushu1zhangxiangxuan1

+1

Apr 26 '24 06:04 kevaldekivadiya2415

want +1

Apr 27 '24 11:04 Lu0Key

+1

Would be great to run CohereForAI/c4ai-command-r-plus-4bit.

Apr 27 '24 11:04 timbmg

+1

Apr 30 '24 01:04 cheney369

+1

May 01 '24 14:05 warlockedward

+1

May 01 '24 21:05 aaron-imani

It will be very usefull for QLORA finetunned models, is there a roadmap for this addition?

May 02 '24 18:05 javierquin

+1

May 03 '24 09:05 dhruvil237

+1

May 06 '24 15:05 dariemp

+1

May 06 '24 21:05 qashzar

+1

May 08 '24 09:05 salt00n9

Please stop commenting +1, just react to the original post with the thumbs up emoji. Commenting with such comment does not add any value and notifies all people subscribed to this issue.

May 10 '24 11:05 qdm12

Refer to : https://github.com/vllm-project/vllm/pull/4776

May 13 '24 02:05 jeejeelee

want +1

May 13 '24 07:05 Vegetable-Chicken-Coder

related to https://github.com/vllm-project/vllm/issues/3339

May 20 '24 03:05 duchengyao

What's required to implement this? FP4 and NF4 support?

It seems line bnb uses 2 esponent digits and 1 mantissa digit format for FP4. https://github.com/TimDettmers/bitsandbytes/blob/25abf8d95f8a33f38e2ce6f637768b442379ccd9/bitsandbytes/functional.py#L1049-L1059

May 20 '24 07:05 epignatelli

+1

May 26 '24 18:05 flaviusburca

Hi, those who need this feature should check out what @chenqianfzh is working on here: https://github.com/vllm-project/vllm/pull/4776

May 27 '24 02:05 jeejeelee

Hi Team when can we expect this feature ?

Jun 07 '24 13:06 VpkPrasanna

+1 any update on this it seems @chenqianfzh https://github.com/vllm-project/vllm/pull/4776 is not working with LLAMA 3

Jul 01 '24 17:07 devlup

bitsandbytes is now supported https://docs.vllm.ai/en/latest/quantization/supported_hardware.html

Jul 04 '24 13:07 hmellor

It's not working for LLama 3 , https://github.com/bd-iaas-us/vllm/blob/e16bcb69495540b21a3bd9423cdd5df8a78405ea/tests/quantization/test_bitsandbytes.py replace it with llama3 8b , it's failing the tests @hmellor @chenqianfzh

Jul 08 '24 15:07 devlup

@hmellor, how do you load in 8-bit? This version seems to only be able to load in 4-bit via quantization="bitsandbytes", load_format="bitsandbytes"?

Aug 17 '24 10:08 junzhang-zj

vllm vllm copied to clipboard

[Feature]: bitsandbytes support

🚀 The feature, motivation and pitch

Alternatives

Additional context

vllm
vllm copied to clipboard