llm-awq inference speed

inference speed

Open frankxyy opened this issue 1 year ago • 1 comments

Hi，how is the inference speed of implementation in this repo compared with that of exllama for a quantized model?

Nov 29 '23 15:11 frankxyy

From my tests AWQ has a worse latency.

Dec 01 '23 10:12 fxmarty