[CK_TILE] Support moe with up gemm
Proposed changes
Support fused MoE with up gemm.
Checklist
Please put an x into the boxes that apply. You can also fill these out after creating the PR. If you're not sure, please don't hesitate to ask.
- [x] I have added tests relevant to the introduced functionality, and the unit tests are passing locally
- [x] I have added inline documentation which enables the maintainers with understanding the motivation
- [x] I have removed the stale documentation which is no longer relevant after this pull request
- [ ] (If this change is user-facing) I have added release notes which provide the end users with a brief summary of the improvement from this pull request
- [x] I have run
clang-formaton all changed files - [x] Any dependent changes have been merged
Discussion
If this is a relatively large or complex change, feel free to start a discussion by explaining why you chose the solution you did and what alternatives you considered
Please note that 2 function call to the flatmm will result in duplicated call to buffer load matrix A, which will reduce the overall performance. This will be a temporary solution to unblock gate+up, but as a production I think need carefully consider change the flatmm inline asm block to support both gate+up and gate before merge into the mainline
covered by this PR: https://github.com/ROCm/composable_kernel/pull/1808