composable_kernel
composable_kernel copied to clipboard
Composable Kernel: Performance Portable Programming Model for Machine Learning Tensor Operators
1.K padding 2.N padding
## Proposed changes - add preshuffle gemm fp16 - same logic as fp8 and tune configs to reach 2.5TB/s ## Checklist Please put an `x` into the boxes that apply....
### Problem Description Running example binaries generated within composable_kernel/build/bin with GPU verification fails. CPU verification succeeds. See below:  PS: ROCm version 6.3.0 not 6.0.0 ### Operating System Ubuntu 22.04.2...
## Proposed changes Please describe the motivation behind the pull request, whether it enables a new feature or fixes a bug. If there are associated pull requests or issues, please...
## Proposed changes Add document of FMHA kernel ## Checklist Please put an `x` into the boxes that apply. You can also fill these out after creating the PR. If...
## Proposed changes Please describe the motivation behind the pull request, whether it enables a new feature or fixes a bug. If there are associated pull requests or issues, please...
## Proposed changes As titled ## Checklist Please put an `x` into the boxes that apply. You can also fill these out after creating the PR. If you're not sure,...
moe_sorting kernel num_tokens > 13K compute error. reproducible in aiter, can't reproducible in tile_example_moe_sorting
## Proposed changes **add basic flatmm based on ck_tile:** - flatmm is placed in a seperate example folder - flatmm is using dependent kernel and pipeline and block function -...
## Proposed changes Please describe the motivation behind the pull request, whether it enables a new feature or fixes a bug. If there are associated pull requests or issues, please...