composable_kernel icon indicating copy to clipboard operation
composable_kernel copied to clipboard

Composable Kernel: Performance Portable Programming Model for Machine Learning Tensor Operators

Results 276 composable_kernel issues
Sort by recently updated
recently updated
newest added

## Proposed changes This PR adds RRR layout support for aquat in `example/ck_tile/38_block_scale_gemm/gemm_quant_basic.cpp`. Unit tests are added. This PR is not updated with develop. It awaits the regression introduced in...

## Proposed changes update s_waitcnt fields for gfx11 architecture. ## Checklist Please put an `x` into the boxes that apply. You can also fill these out after creating the PR....

## Proposed changes [CK TILE] Convolution remove magix values ## Checklist Please put an `x` into the boxes that apply. You can also fill these out after creating the PR....

## Proposed changes Merge the 2 APIs: `fmha_fwd()` and `fmha_fwd_v3()` together. ## Checklist Please put an `x` into the boxes that apply. You can also fill these out after creating...

-added s_buffer_load_b32/64 assembly -added amd_s_buffer_load_impl ## Proposed changes Please describe the motivation behind the pull request, whether it enables a new feature or fixes a bug. If there are associated...

## Proposed changes We copied the tile engine code of gemm and revise it to align with the StreamK. **Note**: The entire tile engine will be refactored to extract common...

co-authors: @Chi-Chu319 @juuso-oskari Added XCD remapping for flatmm moe batch | Mixtral (tflops, wip_355) | Mixtral-7B  (tflops, our branch) | perf boost -- | -- | -- | -- 64...

## Proposed changes Please describe the motivation behind the pull request, whether it enables a new feature or fixes a bug. If there are associated pull requests or issues, please...

Authors: @Chi-Chu319 @juuso-oskari This PR implements a unified attention kernel written in CK Tile. It builds on top of the fmha_v3 (composable_kernel/example/ck_tile/01_fmha) with the pipeline largely remaining the same. This...