composable_kernel icon indicating copy to clipboard operation
composable_kernel copied to clipboard

Ck moe mxfp4 blockm32

Open xudoyuan opened this issue 1 month ago • 2 comments

Proposed changes

update ck mxfp4 moe in gfx950 for:

  1. support block_m=32 in deepseek tp8 model.
  2. impl moe gemm2 v1 pipe.

Checklist

Please put an x into the boxes that apply. You can also fill these out after creating the PR. If you're not sure, please don't hesitate to ask.

  • [ ] I have added tests relevant to the introduced functionality, and the unit tests are passing locally
  • [ ] I have added the test to REGRESSION_TESTS list defined at the top of CMakeLists.txt in tests/CMakeLists.txt, IF the test takes more than 30 seconds to run.
  • [ ] I have added inline documentation which enables the maintainers with understanding the motivation
  • [ ] I have removed the stale documentation which is no longer relevant after this pull request
  • [ ] (If this change is user-facing) I have added release notes which provide the end users with a brief summary of the improvement from this pull request
  • [ ] I have run clang-format on all changed files
  • [ ] Any dependent changes have been merged

Discussion

If this is a relatively large or complex change, feel free to start a discussion by explaining why you chose the solution you did and what alternatives you considered

xudoyuan avatar Oct 27 '25 09:10 xudoyuan

Can you please describe the purpose of this PR and make sure the CI is passing? Is this related to #3092? If so, perhaps, you should combine them.

illsilin avatar Oct 28 '25 16:10 illsilin

@xudoyuan Ping again please provide a proper description of the changes you are intending to make and address comments. Thank you!

cgmillette avatar Oct 30 '25 17:10 cgmillette

Can you please describe the purpose of this PR and make sure the CI is passing? Is this related to #3092? If so, perhaps, you should combine them.

@illsilin it's to fix a bug in mxfp4 preshuffle gemm blockm=32 and also optimize v1 pipeline for it. Got 10% uplift for small M.

coderfeli avatar Nov 06 '25 08:11 coderfeli