composable_kernel icon indicating copy to clipboard operation
composable_kernel copied to clipboard

Add flex attention example

Open tenpercent opened this issue 10 months ago • 1 comments

Proposed changes

FlexAttention is a customization of Fused Multi-Head Attention where the attention scores are customizeable with a function score_mod (score: float, batch_idx: int, head_idx: int, q_idx: int, v_idx: int) -> new_score: float

Added a new example which copied and customized (1) code generation, (2) pipelines and (3) kernel from 01_fmha

The score modifier is a command-line argument to generate.py

The source of truth for the score modifier is a variable defined in CMakeLists

Running:

  • (only once, create workdir) mkdir build && cd build
  • (optional, clean up the generated files) rm -r example/ck_tile/18_flexattn/
  • cmake .. -G Ninja -D CMAKE_PREFIX_PATH=/opt/rocm -D CMAKE_CXX_COMPILER=/opt/rocm/bin/hipcc -D CMAKE_BUILD_TYPE=Release -D GPU_TARGETS="gfx942"
  • ninja -j128 tile_example_flexattn_fwd
  • single run with default parameters: ./bin/tile_example_flexattn_fwd
  • full test: in composable_kernel folder, ./example/ck_tile/18_flexattn/script/run_full_test.sh

(done) added correctness check with host (done) debug numerical mismatch for batch-mode kernels, now the device and host results match for these kernels (done) re-add group-mode kernels for decoding TBD: debug performance, now the customized version is ~3x slower than original (done): revise indexing in group-mode, since there are numerical mismatches again after adding these kernels

Checklist

Please put an x into the boxes that apply. You can also fill these out after creating the PR. If you're not sure, please don't hesitate to ask.

  • [x] I have added tests relevant to the introduced functionality, and the unit tests are passing locally
  • [ ] I have added the test to REGRESSION_TESTS list defined at the top of CMakeLists.txt in tests/CMakeLists.txt, IF the test takes more than 30 seconds to run.
  • [ ] I have added inline documentation which enables the maintainers with understanding the motivation
  • [ ] I have removed the stale documentation which is no longer relevant after this pull request
  • [ ] (If this change is user-facing) I have added release notes which provide the end users with a brief summary of the improvement from this pull request
  • [x] I have run clang-format on all changed files
  • [x] Any dependent changes have been merged

Discussion

If this is a relatively large or complex change, feel free to start a discussion by explaining why you chose the solution you did and what alternatives you considered

tenpercent avatar Jan 29 '25 03:01 tenpercent

Awesome!

zjing14 avatar Jan 30 '25 01:01 zjing14

Archived at https://github.com/tenpercent/composable_kernel/tree/cktile-flexattn

tenpercent avatar Jun 04 '25 17:06 tenpercent