composable_kernel
composable_kernel copied to clipboard
Composable Kernel: Performance Portable Programming Model for Machine Learning Tensor Operators
## Proposed changes This PR adds RRR layout support for aquat in `example/ck_tile/38_block_scale_gemm/gemm_quant_basic.cpp`. Unit tests are added. This PR is not updated with develop. It awaits the regression introduced in...
## Proposed changes update s_waitcnt fields for gfx11 architecture. ## Checklist Please put an `x` into the boxes that apply. You can also fill these out after creating the PR....
## Proposed changes [CK TILE] Convolution remove magix values ## Checklist Please put an `x` into the boxes that apply. You can also fill these out after creating the PR....
## Proposed changes Merge the 2 APIs: `fmha_fwd()` and `fmha_fwd_v3()` together. ## Checklist Please put an `x` into the boxes that apply. You can also fill these out after creating...
-added s_buffer_load_b32/64 assembly -added amd_s_buffer_load_impl ## Proposed changes Please describe the motivation behind the pull request, whether it enables a new feature or fixes a bug. If there are associated...
## Proposed changes We copied the tile engine code of gemm and revise it to align with the StreamK. **Note**: The entire tile engine will be refactored to extract common...
Convolution descriptions.
co-authors: @Chi-Chu319 @juuso-oskari Added XCD remapping for flatmm moe batch | Mixtral (tflops, wip_355) | Mixtral-7B (tflops, our branch) | perf boost -- | -- | -- | -- 64...
## Proposed changes Please describe the motivation behind the pull request, whether it enables a new feature or fixes a bug. If there are associated pull requests or issues, please...
Authors: @Chi-Chu319 @juuso-oskari This PR implements a unified attention kernel written in CK Tile. It builds on top of the fmha_v3 (composable_kernel/example/ck_tile/01_fmha) with the pipeline largely remaining the same. This...