Inital GQ Attn Impl, issue w scaled_dot_product_attn

Open saurabh111233212 opened this issue 2 years ago • 0 comments

I tried implementing grouped query attention in this PR, but seems that Pytorch's scaled_dot_product_attention doesn't support the kind of broadcasting we'd need for this. Revisit if/when this gets fixed on Pytorch's end.

Aug 14 '23 18:08 saurabh111233212