composable_kernel icon indicating copy to clipboard operation
composable_kernel copied to clipboard

Composable Kernel: Performance Portable Programming Model for Machine Learning Tensor Operators

Results 276 composable_kernel issues
Sort by recently updated
recently updated
newest added

## Proposed changes - add preshuffle gemm fp16 - same logic as fp8 and tune configs to reach 2.5TB/s ## Checklist Please put an `x` into the boxes that apply....

### Problem Description Running example binaries generated within composable_kernel/build/bin with GPU verification fails. CPU verification succeeds. See below: ![Image](https://github.com/user-attachments/assets/4067a572-1108-4617-ad09-ae000ecc6d43) PS: ROCm version 6.3.0 not 6.0.0 ### Operating System Ubuntu 22.04.2...

Under Investigation

## Proposed changes Please describe the motivation behind the pull request, whether it enables a new feature or fixes a bug. If there are associated pull requests or issues, please...

## Proposed changes Add document of FMHA kernel ## Checklist Please put an `x` into the boxes that apply. You can also fill these out after creating the PR. If...

## Proposed changes Please describe the motivation behind the pull request, whether it enables a new feature or fixes a bug. If there are associated pull requests or issues, please...

## Proposed changes As titled ## Checklist Please put an `x` into the boxes that apply. You can also fill these out after creating the PR. If you're not sure,...

compilation time

moe_sorting kernel num_tokens > 13K compute error. reproducible in aiter, can't reproducible in tile_example_moe_sorting

## Proposed changes **add basic flatmm based on ck_tile:** - flatmm is placed in a seperate example folder - flatmm is using dependent kernel and pipeline and block function -...

## Proposed changes Please describe the motivation behind the pull request, whether it enables a new feature or fixes a bug. If there are associated pull requests or issues, please...