composable_kernel icon indicating copy to clipboard operation
composable_kernel copied to clipboard

Fused Attention Kernel with gfx1030?

Open onesnep opened this issue 2 years ago • 2 comments

I was glad to see Flash Attention ported to ROCM, however currently compatibility is limited to gfx90a. I and many others would love to see this on other architectures.

When building Composable Kernel against a gfx1030 target I noticed that the fused attention examples were removed from the test cases. The docs briefly mentioned partial compatibility for gfx1030, but I couldn't find concrete details about the differences in operator support between architectures.

I would appreciate clarification on whether a fused kernel suitable for Flash Attention would be possible on other architectures such as gfx1030 or even gfx1100, and if so, whether this is in the pipeline or else left to the community to implement.

Many thanks

onesnep avatar Sep 01 '23 12:09 onesnep

I'm also curious about this - specifically about support for the gfx906 architecture.

ThePerfectComputer avatar Jan 23 '24 11:01 ThePerfectComputer

I'm also curious about this - specifically about support for the gfx906 architecture.我对此也很好奇——特别是对 gfx906 架构的支持。

hello, I am curious about gfx906 too, did you have it already

linchen111 avatar Jul 30 '24 10:07 linchen111

Hello @onesnep @ThePerfectComputer @linchen111 ,

Composable kernel does not support Flash Attention on other architectures at the moment - only recent Instinct devices. None of the architectures that are mentioned in the comments here (gfx1030, gfx1100, gfx906) support FA.

However, your comments are heard by the composable kernel team. They are aware of requests to port FA to other architectures and this task is in the pipeline, though we unfortunately can't give you a more specific timeline at the moment.

Thanks for your feedback!

jamesxu2 avatar Oct 02 '24 14:10 jamesxu2