ABQ-LLM
ABQ-LLM copied to clipboard
[Question] The end-to-end generation speed and W4A4
Great job, starred! I do have a few questions:
- Did you test the e2e generation speed, specifically in terms of tokens/second or the latency of the first token?
- For the W4A4, the speedup is about 1. Could you share the reason behind this, and is there any potential for improvement?
- For the W4A4, did you compare it with fp16 or bf16?