composable_kernel
composable_kernel copied to clipboard
Composable Kernel: Performance Portable Programming Model for Machine Learning Tensor Operators
The namespace doesn't make sense: A host class is derived from a device class.
## Proposed changes add fmha fwd splitkv receipt for aiter c++ api fix other mha codegen receipt issue to reduce amount of instance ## Checklist Please put an `x` into...
## Proposed changes Please describe the motivation behind the pull request, whether it enables a new feature or fixes a bug. If there are associated pull requests or issues, please...
## Proposed changes Fuse activation and moe ffn1. Supported activations : 0: gelu, 1: silu, 2:swiglu. ## Checklist Please put an `x` into the boxes that apply. You can also...
## Proposed changes Introduces new MX GEMM pipeline for microscaling (MX) data types. At this time, MX FP8 has been verified. Support for more data types is coming soon. ##...
## Proposed changes paged fa for batch prefill ## Checklist Please put an `x` into the boxes that apply. You can also fill these out after creating the PR. If...
## Proposed changes Please describe the motivation behind the pull request, whether it enables a new feature or fixes a bug. If there are associated pull requests or issues, please...
## Proposed changes Add support for build ck tile examples package ## Checklist Please put an `x` into the boxes that apply. You can also fill these out after creating...
## Proposed changes Add 2:4 structured sparsity support for ck tile fp16 gemm ## Checklist Please put an `x` into the boxes that apply. You can also fill these out...
## Proposed changes Remove scratch usage from universal gemm by moving the if kbatch related condition oustide of kernel and passing memory operation enum as a template parameter ## Checklist...