composable_kernel icon indicating copy to clipboard operation
composable_kernel copied to clipboard

Extend support for ak1 / bk1 WMMA

Open EnricoDeg opened this issue 1 month ago • 0 comments

Proposed changes

Summary:

  • Add support for AK1 != BK1
  • Add support for AK1, BK1 > 8
  • Introduce KInner template parameter for pipelines when loading multiple tiles with one instruction

Scenarios now supported:

A Type B Type AK1 BK1 A Load Vmem A Store LDS A Load LDS B Load Vmem B Store LDS B Load LDS KInner
FP16 FP16 2 8 32 bit 32 bit 32 bit 128 bit 128 bit 128 bit 1
FP8 FP8 16 16 128 bit 128 bit 128 bit 128 bit 128 bit 128 bit 2
FP8 FP16 16 8 128 bit 128 bit 64 bit 128 bit 128 bit 128 bit 1

Checklist

Please put an x into the boxes that apply. You can also fill these out after creating the PR. If you're not sure, please don't hesitate to ask.

  • [x] I have added tests relevant to the introduced functionality, and the unit tests are passing locally
  • [x] I have added the test to REGRESSION_TESTS list defined at the top of CMakeLists.txt in tests/CMakeLists.txt, IF the test takes more than 30 seconds to run.
  • [x] I have added inline documentation which enables the maintainers with understanding the motivation
  • [x] I have removed the stale documentation which is no longer relevant after this pull request
  • [x] (If this change is user-facing) I have added release notes which provide the end users with a brief summary of the improvement from this pull request
  • [x] I have run clang-format on all changed files
  • [x] Any dependent changes have been merged

Discussion

If this is a relatively large or complex change, feel free to start a discussion by explaining why you chose the solution you did and what alternatives you considered

EnricoDeg avatar Oct 22 '25 07:10 EnricoDeg