CLEAR
CLEAR copied to clipboard
Speedup should be much more than 6.3x on 8K resolution?
Hi, Thanks for your great work! I have one question, from table 7, for the 8K resolution, the TFLOPS reduced from 847.73 to 3.92, but why the overall speedup is only from 1842.48 to 293.50? Does the flexattention here become the bottleneck?
Thanks for your question!
Exactly. Currently the reason that the practical acceleration falls behind the theoretical results is on the difficulty in implementation.