composable_kernel icon indicating copy to clipboard operation
composable_kernel copied to clipboard

Hstu attention n0loop fused unroll pr

Open qianfengz opened this issue 2 months ago • 0 comments

This PR brings an implementation of HSTU attention on ck_tile. HSTU attention is very different from the fmha implemented in ck_tile, for details, please refer to the hstu paper

The implementation is well verified on MI300 for both functionalities and targeted performance, but it does not make any optimization for MI350.

To build #> cd build; ../scripts/cmake-ck-dev.sh .. gfx942; make -j 128 tile_example_hstu_attention

To verify #> . examples/ck_tile/23_hstu_attention/scripts/test_hstu_attention.sh

The codes of HSTU are all located under the folder examples/ck_tile/23_hstu_attention, but this PR also made some tiny change to the core ck_tile codes under include/ck_tile/core/tensor

qianfengz avatar Sep 22 '25 13:09 qianfengz