composable_kernel icon indicating copy to clipboard operation
composable_kernel copied to clipboard

Composable Kernel: Performance Portable Programming Model for Machine Learning Tensor Operators

Results 151 composable_kernel issues
Sort by recently updated
recently updated
newest added

We found the backward weight convolution kernels will lead to errors when enable profiling for the ck invoker run() functions, which made the ck-based solver failed in MIOpen. We have...

Under Investigation

Add ability to use the new iglp_opt(2) builtin, off by default.

### Problem Description Different matrix padding for CK, such as. GemmSpec = = GemmSpecialization::MKPadding. GemmSpec = = GemmSpecialization::NKPadding. GemmSpec = = GemmSpecialization::MNKPadding. GemmSpec = = GemmSpecialization::KPadding)). Will it have any...

XDL kernels are instantiated with various different BlockSize/MPerBlock/NPerBlock/etc. template parameters in the library, so it's easy to pick a good set of parameters for a particular convolution. But unfortunately the...

enhancement

### Problem Description I have a 7900XTX (RDNA3, navi3, gfx1100) card that I'm trying to do some useful LLM work and one of the requirements I have is xformers. I...

Hi, will composable_kernel provide an interface similar to cutlass? Some projects based cutlass are difficult to hipify, such as fastertransformer, xformer etc.

CK Tile Programming Interface and some examples

WIP

Hello, I try to change mfma_f32_32x32x8f16 instruction to mfma_f32_16x16x16f16 instruction in [grouped_multihead_attention_forward_v2.cpp](https://github.com/ROCmSoftwarePlatform/composable_kernel/blob/mha-train-develop/example/32_batched_gemm_scale_softmax_gemm/grouped_multihead_attention_forward_v2.cpp), but I get wrong result. Is there anything else need to be modified besides the GEMM parameter?

### Problem Description Attempting to compile [22db1e0] on my system. I am running into the following Cmake error: ```CMake Error at CMakeLists.txt:427 (file): file failed to open for reading (No...

I was glad to see [Flash Attention ported to ROCM](https://github.com/ROCmSoftwarePlatform/flash-attention), however currently compatibility is limited to gfx90a. I and many others would love to see this on other architectures. When...

Under Investigation