composable_kernel icon indicating copy to clipboard operation
composable_kernel copied to clipboard

Composable Kernel: Performance Portable Programming Model for Machine Learning Tensor Operators

Results 151 composable_kernel issues
Sort by recently updated
recently updated
newest added

### Problem Description to debug `02_gemm_add_add_fastgelu` with client api, I tried to enable arg.Print() under Invoker:;Run() as following: ```c++ // Invoker struct Invoker : public BaseInvoker { using Argument =...

Under Investigation

Add new `fmha_fwd_appendkv()` API which runs ahead the `fmha_fwd()`/`fmha_fwd_splitkv()` API. The `fmha_fwd_appendkv()` + `fmha_fwd()`/`fmha_fwd_splitkv()` combination implement the functionality of `mha_fwd_kvcache()` in FA 2.5 (without paged-kvcache part)

This will reduce the size of binaries built with compilers ROCm6.2+ by at least 50%.

This will help prevent CI pipeline crashes due to nodes running out of disc space.

Added structural sparsity blockwise gemm

Enabled bf16 atomic_add on MI300

* This PR is to generate the mha static lib from generate.py