Attention merging backward

Open alihassanijr opened this issue 2 months ago • 0 comments

Backward pass for attention merging needs to be handled manually. dQs from different KV branches should just be elementwise added together.

See https://github.com/Dao-AILab/flash-attention/issues/1137

Oct 21 '25 18:10 alihassanijr