ringattention icon indicating copy to clipboard operation
ringattention copied to clipboard

This work doesn't change kernel, but utilize dependency to compute a whole line?

Open ziyuhuang123 opened this issue 7 months ago • 0 comments

Your idea is very excellent and I have starred your repo. I want to check my understanding's correctness:

This paper does not modify the kernel implementation but instead considers that different rows in the sequence dimension of Q are independent. Therefore, it calculates from attention to FFN in one go, which quickly consumes intermediate results and allows for the computation of larger sequence lengths.

Is it correct?

ziyuhuang123 avatar Jul 02 '24 11:07 ziyuhuang123