machinelearning icon indicating copy to clipboard operation
machinelearning copied to clipboard

[GenAI] Use BitsAndBytes for 4bit quantization.

Open LittleLittleCloud opened this issue 9 months ago • 6 comments

We are excited to review your PR.

So we can do the best job, please check:

  • [x] There's a descriptive title that will make sense to other developers some time from now.
  • [x] There's associated issues. All PR's should have issue(s) associated - unless a trivial self-evident change such as fixing a typo. You can use the format Fixes #nnnn in your description to cause GitHub to automatically close the issue(s) when your PR is merged.
  • [x] Your change description explains what the change does, why you chose your approach, and anything else that reviewers should know.
  • [x] You have included any necessary tests in the same PR.

This PR uses the 4bit quantization method from bitsandbytes library to quantize linear layer into 4 bits.

What's bitsandbytes

bitsandbytes is a library used by huggingface transformer to provide support for 4bit and 8bit quantization and operation.

The bitsandbytes is written in cuda, and we provide a C# binding library LittleLittleCloud.TorchSharp.BitsAndBytes to enable easy leverage with torchsharp library.

LittleLittleCloud avatar Mar 02 '25 01:03 LittleLittleCloud

/azp run

LittleLittleCloud avatar Mar 03 '25 18:03 LittleLittleCloud

Azure Pipelines successfully started running 2 pipeline(s).

azure-pipelines[bot] avatar Mar 03 '25 18:03 azure-pipelines[bot]

/azp run

LittleLittleCloud avatar Mar 04 '25 06:03 LittleLittleCloud

Azure Pipelines successfully started running 2 pipeline(s).

azure-pipelines[bot] avatar Mar 04 '25 06:03 azure-pipelines[bot]

/azp run

LittleLittleCloud avatar Mar 04 '25 21:03 LittleLittleCloud

Azure Pipelines successfully started running 2 pipeline(s).

azure-pipelines[bot] avatar Mar 04 '25 21:03 azure-pipelines[bot]