Implement 4bit, 8bit quantization for Nvidia GPUs
Can be done with GPTQ
@VictorOdede What do you think sort of time commitment this is?
If this isn't done with a library its a $200 ticket, if so its a SWAG ticket
If this isn't done with a library its a $200 ticket, if so its a SWAG ticket
This can be done using bitsandbytes library
@VictorOdede What do you think sort of time commitment this is?
A few hours max
@VictorOdede Is this issue resolved yet?
Hey @bilal-aamer. This has already been implemented with bitsandbytes/gptq. Just doing some tests before merging the PR.