composable_kernel
composable_kernel copied to clipboard
Composable Kernel: Performance Portable Programming Model for Machine Learning Tensor Operators
## Proposed changes One of the few libraries that still consumes CK static libs, MIOpen, is currently only using the convolution library, and it only needs certain data layouts. So...
Added XCD remapping for flatmm moe batch | Mixtral (tflops, wip_355) | Mixtral-7B (tflops, our branch) | perf boost -- | -- | -- | -- 64 | 865.424 |...
## Proposed changes Please describe the motivation behind the pull request, whether it enables a new feature or fixes a bug. If there are associated pull requests or issues, please...
## Proposed changes Due the changes in the recent past couple of weeks the functionality of the V5 pipeline is broken. Fixing this issue with these proposed code changes. Following...
## Proposed changes change the preshuffle format of gfx950. ## Checklist Please put an `x` into the boxes that apply. You can also fill these out after creating the PR....
## Proposed changes Enable hdim=96/160/192 instances in fmha fwd and turn on tests for them. ## Checklist Please put an `x` into the boxes that apply. You can also fill...
## Proposed changes Summary: - Modify gridwise implementation to work with convolution (grid descriptors are not created internally but passed from the device level) - Add device level implementation: `DeviceGroupedConvBwdWeight_Wmma_CShuffleV3`...
## Proposed changes Shard grouped conv bwd data instances to improve compilation time ## Checklist Please put an `x` into the boxes that apply. You can also fill these out...
Draft just to check upstream CI for now. ## Proposed changes Please describe the motivation behind the pull request, whether it enables a new feature or fixes a bug. If...
## Proposed changes For the further codegen unification it would be nice to refactor the common dispatch code out ## Checklist Please put an `x` into the boxes that apply....