llm-awq
llm-awq copied to clipboard
inference speed
Hi,how is the inference speed of implementation in this repo compared with that of exllama for a quantized model?
From my tests AWQ has a worse latency.