hqq
hqq copied to clipboard
integrated into gpt-fast
Is it possible to easily integrate hqq's quantization and forward into gpt-fast repo? In gpt-fast, there is int8, int4 quantization, i want to replace them with hqq and using hqq for low-bit inference while keep other structures unchanged. What is the easiest way to do this with least code change? Thanks for any valuable advice!
It's already integrated in torchao: https://github.com/pytorch/ao/releases/tag/v0.5.0
So you just use quantize_(model, int4_weight_only(group_size, use_hqq=True) for example