Added BitsAndBytes Support

Open timBoML opened this issue 1 year ago • 1 comments

Hello Byaldi Team!

Description

I added BitsAndBytes support for all us GPU-poor people. This enables 4-bit/8-bit quantization to run the models on smaller GPUs or, in my case, leave space for a bigger LLM.

Changes Made

Added BitsAndBytes quantization options to model loading
Updated dependencies to include bitsandbytes
Added quant_strategy in the example notebook

Testing

I am using Byaldi in a commercial setting and the 4-bit quantization didn't affect performance.

Nov 15 '24 07:11 timBoML

I found a typo

Nov 15 '24 07:11 timBoML