machinelearning
machinelearning copied to clipboard
[GenAI] Use BitsAndBytes for 4bit quantization.
We are excited to review your PR.
So we can do the best job, please check:
- [x] There's a descriptive title that will make sense to other developers some time from now.
- [x] There's associated issues. All PR's should have issue(s) associated - unless a trivial self-evident change such as fixing a typo. You can use the format
Fixes #nnnnin your description to cause GitHub to automatically close the issue(s) when your PR is merged. - [x] Your change description explains what the change does, why you chose your approach, and anything else that reviewers should know.
- [x] You have included any necessary tests in the same PR.
This PR uses the 4bit quantization method from bitsandbytes library to quantize linear layer into 4 bits.
What's bitsandbytes
bitsandbytes is a library used by huggingface transformer to provide support for 4bit and 8bit quantization and operation.
The bitsandbytes is written in cuda, and we provide a C# binding library LittleLittleCloud.TorchSharp.BitsAndBytes to enable easy leverage with torchsharp library.
/azp run
Azure Pipelines successfully started running 2 pipeline(s).
/azp run
Azure Pipelines successfully started running 2 pipeline(s).
/azp run
Azure Pipelines successfully started running 2 pipeline(s).