composable_kernel issues

Backward weight convolution kernel error when enable profiling

1

We found the backward weight convolution kernels will lead to errors when enable profiling for the ck invoker run() functions, which made the ck-based solver failed in MIOpen. We have...

iq136boy

Under Investigation

Add iglp_opt(2)

1

Add ability to use the new iglp_opt(2) builtin, off by default.

jrbyrnes

### Problem Description Different matrix padding for CK, such as. GemmSpec = = GemmSpecialization::MKPadding. GemmSpec = = GemmSpecialization::NKPadding. GemmSpec = = GemmSpecialization::MNKPadding. GemmSpec = = GemmSpecialization::KPadding)). Will it have any...

xiaobo1025

Other block sizes for DL and WMMA kernels

3

XDL kernels are instantiated with various different BlockSize/MPerBlock/NPerBlock/etc. template parameters in the library, so it's easy to pick a good set of parameters for a particular convolution. But unfortunately the...

atafra

enhancement

[Issue]: ROCm/xformer won't compile due to missing ck/tensor/tensor_view.hpp

1

### Problem Description I have a 7900XTX (RDNA3, navi3, gfx1100) card that I'm trying to do some useful LLM work and one of the requirements I have is xformers. I...

lhl

Interfaces compared with cutlass

8

Hi, will composable_kernel provide an interface similar to cutlass? Some projects based cutlass are difficult to hipify, such as fastertransformer, xformer etc.

hclearner

Bulk PR for Tile Programming

CK Tile Programming Interface and some examples

asroy

WIP

mha dosen't support mfma_f32_16x16x16f16 instruction

3

Hello, I try to change mfma_f32_32x32x8f16 instruction to mfma_f32_16x16x16f16 instruction in [grouped_multihead_attention_forward_v2.cpp](https://github.com/ROCmSoftwarePlatform/composable_kernel/blob/mha-train-develop/example/32_batched_gemm_scale_softmax_gemm/grouped_multihead_attention_forward_v2.cpp), but I get wrong result. Is there anything else need to be modified besides the GEMM parameter?

hengyeliu

[CMake]: Missing CMakeLists.txt files

4

### Problem Description Attempting to compile [22db1e0] on my system. I am running into the following Cmake error: ```CMake Error at CMakeLists.txt:427 (file): file failed to open for reading (No...

MoraFermi

Fused Attention Kernel with gfx1030?

2

I was glad to see [Flash Attention ported to ROCM](https://github.com/ROCmSoftwarePlatform/flash-attention), however currently compatibility is limited to gfx90a. I and many others would love to see this on other architectures. When...

onesnep

Under Investigation

composable_kernel
composable_kernel copied to clipboard

Metadata

Backward weight convolution kernel error when enable profiling

Add iglp_opt(2)

GemmSpecialization

Other block sizes for DL and WMMA kernels

[Issue]: ROCm/xformer won't compile due to missing ck/tensor/tensor_view.hpp

Interfaces compared with cutlass

Bulk PR for Tile Programming

mha dosen't support mfma_f32_16x16x16f16 instruction

[CMake]: Missing CMakeLists.txt files

Fused Attention Kernel with gfx1030?

← Metadata

Owner

Metadata

composable_kernel composable_kernel copied to clipboard

Metadata

← Metadata

Owner

Metadata

composable_kernel
composable_kernel copied to clipboard