ringattention
ringattention copied to clipboard
This work doesn't change kernel, but utilize dependency to compute a whole line?
Your idea is very excellent and I have starred your repo. I want to check my understanding's correctness:
This paper does not modify the kernel implementation but instead considers that different rows in the sequence dimension of Q are independent. Therefore, it calculates from attention to FFN in one go, which quickly consumes intermediate results and allows for the computation of larger sequence lengths.
Is it correct?