exo
exo copied to clipboard
Ring Attention for coupling the data transfer with computation of attention block matrices
A promising idea from the community: