DeepSpeed-MII Quantization Support for Fastgen?

Quantization Support for Fastgen?

Open aliozts opened this issue 2 years ago • 4 comments

Hello, does newly released fastgen support any AWQ/GPTQ quantization for the models it supports?

Nov 04 '23 12:11 aliozts

Adding quantization support is a high priority item on our roadmap! We are working to add support for this soon and as the timeline becomes more concrete will share more information.

Nov 08 '23 17:11 cmikeh2

Is there any plan to support Mistral 8-bit quantization recently?

Jan 06 '24 04:01 x66ccff

hi @cmikeh2, is there any update on AWQ support?

Jan 17 '24 16:01 aniketmaurya

Of the recent techniques, SmoothQuant from MIT seems extremely promising for serving. It's W8A8 quant, so you don't need to dequantize during inference. This means that inference with SmoothQuant has better latency and throughput than with f16.

Implementation: https://github.com/AniZpZ/AutoSmoothQuant PR for vLLM: https://github.com/vllm-project/vllm/pull/1508

Jan 27 '24 08:01 DreamGenX

DeepSpeed-MII DeepSpeed-MII copied to clipboard

Quantization Support for Fastgen?

DeepSpeed-MII
DeepSpeed-MII copied to clipboard