PyTorch scaled_dot_product_attention doesn't support broadcast for grouped-query-attention

Open saurabh111233212 opened this issue 2 years ago • 1 comments

❓ The question

I tried implementing grouped query attention in this pull request, but seems that pytorch's scaled_dot_product_attention doesn't support the kind of broadcasting we'd need for this. Revisit if/when this gets fixed on pytorch's end.

Aug 14 '23 18:08 saurabh111233212

TODO: decide ourselves how to broadcast K and V tenors to match Q shape when using grouped-query-attention. To be revisited...

Aug 14 '23 18:08 saurabh111233212

I apologize for our delay in response. In order to help surface current, unresolved issues, we are closing tickets prior to February 29. Please reopen your ticket if you are continuing to experience this issue. Thank you!

Apr 30 '24 18:04 dumitrac