composable_kernel icon indicating copy to clipboard operation
composable_kernel copied to clipboard

Composable Kernel: Performance Portable Programming Model for Machine Learning Tensor Operators

Results 100 composable_kernel issues
Sort by recently updated
recently updated
newest added

Remove any controversial terminology or info from the repo.

- Add element op - Add instances - Add example - Add client example

CI - Pass

Add unit tests for grouped gemm two stage covering all existing instances. Also made a fix to allow skipping empty gemms.

(https://github.com/ROCm/composable_kernel/blob/764164b488a9009842c0ce4b14aa74d49eec5e6a/include/ck/tensor_operation/gpu/thread/threadwise_tensor_slice_transfer_v5r1.hpp#L147C1-L157C20) This part of code seems incorrect when implementing the space curve algorithm; For example, if `ordered_src_access_idx = Sequence; ordered_src_access_lengths = Sequence;` when `i = 3`, tmp's result is `ordered_src_access_idx[0]...

### Problem Description I was able to build [flash-attention ROCM](https://github.com/ROCmSoftwarePlatform/flash-attention) for both my Mi100 and Mi50 cards, but only got flash attention working on the Mi100(very impressive performance I might...

A recent clang change (https://github.com/llvm/llvm-project/pull/90152) revealed an issue in develop branch of composable_kernel: https://github.com/ROCm/composable_kernel/blob/08d51d9bc4ec275fce3ad0a01a08ab1fd45636bc/include/ck/tensor_operation/gpu/block/blockwise_gemm_xdlops.hpp#L799 ``` composable_kernel/include/ck/tensor_operation/gpu/block/blockwise_gemm_xdlops.hpp:799:32: error: no member named 'a_origin' in 'BlockwiseGemmXdlops_v2' 799 | : a_thread_copy_(other.a_origin), b_thread_copy_(other.b_origin) | ~~~~~...