composable_kernel icon indicating copy to clipboard operation
composable_kernel copied to clipboard

[Draft] | GPUAI-3720 - Integrate Universal GEMM into Grouped GEMM - Pt 1

Open rtmadduri opened this issue 11 months ago • 0 comments

Proposed changes

This PR integrates Universal GEMM into Device Grouped Gemm. Specifically, we replace: The GridwiseGemm_bk0mk1_bk0nk1_mn_xdlops_v2r4r2 in device_grouped_gemm_xdl_splitk_cshuffle.hpp with GridwiseGemm_xdl_cshuffle_v3

We make corresponding changes to the struct Argument and struct Invoke

Checklist

Please put an x into the boxes that apply. You can also fill these out after creating the PR. If you're not sure, please don't hesitate to ask.

  • [ ] I have added tests relevant to the introduced functionality, and the unit tests are passing locally
  • [ ] I have added inline documentation which enables the maintainers with understanding the motivation
  • [ ] I have removed the stale documentation which is no longer relevant after this pull request
  • [ ] (If this change is user-facing) I have added release notes which provide the end users with a brief summary of the improvement from this pull request
  • [ ] I have run clang-format on all changed files
  • [ ] Any dependent changes have been merged

Discussion

If this is a relatively large or complex change, feel free to start a discussion by explaining why you chose the solution you did and what alternatives you considered

rtmadduri avatar Jan 07 '25 17:01 rtmadduri