gpt-fast Activation quantization support

Activation quantization support

Open ayyoobimani opened this issue 1 year ago • 1 comments

Many papers have recently addressed the issues with quantization of activations for LLMs.

Examples: https://github.com/ziplab/QLLM?tab=readme-ov-file#%F0%9F%9B%A0-install https://github.com/mit-han-lab/lmquant?tab=readme-ov-file#efficiency-benchmarks https://github.com/spcl/QuaRot

Is it possible to add activation quantization support to gpt-fast for even more speedup?

Any insight on the limitations and possibilities is appreciated.

Aug 12 '24 16:08 ayyoobimani

gpt-fast gpt-fast copied to clipboard

Activation quantization support

gpt-fast
gpt-fast copied to clipboard