composable_kernel icon indicating copy to clipboard operation
composable_kernel copied to clipboard

Bf16*fp4 gemm

Open eliotwang opened this issue 3 months ago • 2 comments

Proposed changes

Added an example of bf16*fp4 gemm, where fp4 and fp4_scale are in uint8 data format. In the pipeline, matrix B(fp4) will be dequantized to bf16 before performing multiplication operations.

Checklist

Please put an x into the boxes that apply. You can also fill these out after creating the PR. If you're not sure, please don't hesitate to ask.

  • [ ] I have added tests relevant to the introduced functionality, and the unit tests are passing locally
  • [ ] I have added the test to REGRESSION_TESTS list defined at the top of CMakeLists.txt in tests/CMakeLists.txt, IF the test takes more than 30 seconds to run.
  • [ ] I have added inline documentation which enables the maintainers with understanding the motivation
  • [ ] I have removed the stale documentation which is no longer relevant after this pull request
  • [ ] (If this change is user-facing) I have added release notes which provide the end users with a brief summary of the improvement from this pull request
  • [ ] I have run clang-format on all changed files
  • [ ] Any dependent changes have been merged

Discussion

If this is a relatively large or complex change, feel free to start a discussion by explaining why you chose the solution you did and what alternatives you considered

eliotwang avatar Sep 08 '25 12:09 eliotwang

Hi, there is already a bf16*mxfp4 gemm implementation (mixed_prec_flatmm) in #3022.

DDEle avatar Oct 20 '25 02:10 DDEle

Hi, there is already a bf16*mxfp4 gemm implementation (mixed_prec_flatmm) in #3022. The functionality is somewhat similar, but in our example, the format of matrix B is uint8_t (each element in matrix B is composed of two pk_fp4_t packed together), primarily to match the data format used in models like GPT-OSS. Additionally, I ran the example from the link you provided, and it seems the results displayed were incorrect. Are there any usage limitations for that example?

eliotwang avatar Oct 20 '25 10:10 eliotwang

Hi @eliotwang, please resolve conflicts and sync branch to latest develop in order to proceed! Thanks!

illsilin avatar Nov 13 '25 18:11 illsilin

@eliotwang LGTM overall. Please add the unit test.

ThomasNing avatar Nov 17 '25 23:11 ThomasNing

@eliotwang LGTM overall. Please add the unit test.

We have added unit tests for bf16_mxfp4_gemm in the test/ck_tile/gemm_block_scale/ directory. Please help review it.

eliotwang avatar Nov 19 '25 03:11 eliotwang

@eliotwang LGTM, we could do the last iteration of the merging after the PR #3245 merged to the develop. Thanks!

cc. @CongMa13

ThomasNing avatar Nov 21 '25 00:11 ThomasNing