llm-awq icon indicating copy to clipboard operation
llm-awq copied to clipboard

inference speed

Open frankxyy opened this issue 1 year ago • 1 comments

Hi,how is the inference speed of implementation in this repo compared with that of exllama for a quantized model?

frankxyy avatar Nov 29 '23 15:11 frankxyy

From my tests AWQ has a worse latency.

fxmarty avatar Dec 01 '23 10:12 fxmarty