Implement 4bit, 8bit quantization for Nvidia GPUs

Open VictorOdede opened this issue 2 years ago • 6 comments

Can be done with GPTQ

Sep 04 '23 15:09 VictorOdede

@VictorOdede What do you think sort of time commitment this is?

Sep 11 '23 16:09 mmirman

If this isn't done with a library its a $200 ticket, if so its a SWAG ticket

Sep 11 '23 16:09 mmirman

If this isn't done with a library its a $200 ticket, if so its a SWAG ticket

This can be done using bitsandbytes library

Sep 11 '23 16:09 VictorOdede

@VictorOdede What do you think sort of time commitment this is?

A few hours max

Sep 11 '23 18:09 VictorOdede

@VictorOdede Is this issue resolved yet?

Sep 20 '23 11:09 bilal-aamer

Hey @bilal-aamer. This has already been implemented with bitsandbytes/gptq. Just doing some tests before merging the PR.

Sep 20 '23 12:09 VictorOdede