ABQ-LLM
ABQ-LLM copied to clipboard

Published 20 hours ago •

Reame
Issues

[Question] The end-to-end generation speed and W4A4

Open aur61 opened this issue 4 months ago • 0 comments

Great job, starred! I do have a few questions:

Did you test the e2e generation speed, specifically in terms of tokens/second or the latency of the first token?
For the W4A4, the speedup is about 1. Could you share the reason behind this, and is there any potential for improvement?
For the W4A4, did you compare it with fp16 or bf16?

Oct 13 '24 15:10 aur61