ABQ-LLM icon indicating copy to clipboard operation
ABQ-LLM copied to clipboard

[Question] The end-to-end generation speed and W4A4

Open aur61 opened this issue 4 months ago • 0 comments

Great job, starred! I do have a few questions:

  1. Did you test the e2e generation speed, specifically in terms of tokens/second or the latency of the first token?
  2. For the W4A4, the speedup is about 1. Could you share the reason behind this, and is there any potential for improvement?
  3. For the W4A4, did you compare it with fp16 or bf16?

image

aur61 avatar Oct 13 '24 15:10 aur61