DeepSpeed icon indicating copy to clipboard operation
DeepSpeed copied to clipboard

[REQUEST] DeepSpeed-FastGen AWQ support

Open NaCloudAI opened this issue 1 year ago • 3 comments

Is your feature request related to a problem? Please describe. Without 4bit quantization the batch size is limited

Describe the solution you'd like Add AWQ support, just like TGI

Describe alternatives you've considered other 4bit quantization, but AWQ is so far best

Additional context Add any other context or screenshots about the feature request here.

NaCloudAI avatar Nov 05 '23 02:11 NaCloudAI

@NaCloudAI FriendliAI PeriFlow (friendli.ai/try-periflow) supports AWQ-ed model inference serving natively. Here is a blog. https://friendli.ai/blog/activation-aware-weight-quantization-periflow/

bgchun avatar Nov 06 '23 08:11 bgchun

@bgchun your website is not working image

NaCloudAI avatar Dec 06 '23 23:12 NaCloudAI

any updates on this?

vidhyat98 avatar May 02 '24 18:05 vidhyat98