gpt-fast
gpt-fast copied to clipboard
Activation quantization support
Many papers have recently addressed the issues with quantization of activations for LLMs.
Examples: https://github.com/ziplab/QLLM?tab=readme-ov-file#%F0%9F%9B%A0-install https://github.com/mit-han-lab/lmquant?tab=readme-ov-file#efficiency-benchmarks https://github.com/spcl/QuaRot
Is it possible to add activation quantization support to gpt-fast for even more speedup?
Any insight on the limitations and possibilities is appreciated.