composable_kernel issues

GEMM specification not found in CK?

1

Hi there, I am wondering which impl or example I can refer to if I want to run: RELU(GEMM (A[M,K]_INT8, B[K,N]_INT8) + bias_INT8) = output_INT8. I tried example/03_gemm_bias_relu, but it...

chengl-amd

Compilation error for navi10 (use of undeclared identifier 'CK_BUFFER_RESOURCE_3RD_DWORD')

50

Hello, I have some trouble to compile composable_kernel for my AMD GPU architecture (gfx1010) ``` cmake \ -D CMAKE_PREFIX_PATH=/opt/rocm \ -D CMAKE_CXX_COMPILER=/opt/rocm/bin/hipcc \ -D CMAKE_CXX_FLAGS="-O3" \ -D CMAKE_BUILD_TYPE=Release \ -D...

TyraVex

Under Investigation

DTYPES : some newly created instances are not well protected

4

[Reproduce] ``` CXX=/opt/rocm/llvm/bin/clang++ cmake -DCMAKE_CXX_COMPILER_LAUNCHER="${COMPILER_LAUNCHER}" -DCMAKE_PREFIX_PATH=/opt/rocm -DDTYPES="fp16;fp32;bf16" -DCMAKE_BUILD_TYPE=Release -DINSTANCES_ONLY=ON -DGPU_TARGETS="gfx1100" .. ``` hence `-DDTYPES="fp16;fp32;bf16"`, then ``` make -j$(nproc) ``` [Observation] ``` /home/junliu/composable_kernel/library/src/tensor_operation_instance/gpu/grouped_gemm_fixed_nk/device_grouped_gemm_xdl_fixed_nk_f16_f8_f16_mk_kn_mn_instance.cpp:18:17: error: no type named 'f8_t' in namespace 'ck'...

junliume

bug

code quality

urgency_high

Complex Contraction CK Scale Example with test

ozturkosu

Complex Contraction CK Bilinear Example

2

This is example of complex tensor contraction based on 4 GEMM. All imaginary and real tensor are explicitly defined.

ozturkosu

Kernarg latency opt

This depends on PR#1028. The new files modified are few: ``` modified: example/53_gemv_splitk/CMakeLists.txt modified: example/54_tall_and_skinny_gemm_splitk/CMakeLists.txt modified: example/54_tall_and_skinny_gemm_splitk/run_tall_and_skinny_gemm_splitk_example.inc modified: include/ck/host_utility/kernel_launch.hpp modified: include/ck/tensor_operation/gpu/device/impl/device_tall_and_skinny_gemm_splitk.hpp modified: include/ck/tensor_operation/gpu/grid/gridwise_tall_and_skinny_gemm_splitk.hpp conflict resolved: library/src/tensor_operation_instance/gpu/CMakeLists.txt ```

hsadasiv

Tall and skinny GEMM & GEMV files are added under examples and ckprofiler

Tall and skinny GEMM & GEMV files are added for examples and ckprofiler to work.

hsadasiv

FlashAttention train kernels

2

FlashAttentionV1: forward kloop: [gridwise_batched_mha_fwd_xdl_cshuffle_v1.hpp](https://github.com/ROCmSoftwarePlatform/composable_kernel/blob/mha-train-develop/include/ck/tensor_operation/gpu/grid/gridwise_batched_mha_fwd_xdl_cshuffle_v1.hpp) backward kloop prototype1: [gridwise_batched_mha_bwd_xdl_cshuffle_kloop_v1.hpp](https://github.com/ROCmSoftwarePlatform/composable_kernel/blob/mha-train-develop/include/ck/tensor_operation/gpu/grid/gridwise_batched_mha_bwd_xdl_cshuffle_kloop_v1.hpp) backward kloop prototype2: [gridwise_batched_mha_bwd_xdl_cshuffle_kloop_v2.hpp](https://github.com/ROCmSoftwarePlatform/composable_kernel/blob/mha-train-develop/include/ck/tensor_operation/gpu/grid/gridwise_batched_mha_bwd_xdl_cshuffle_kloop_v2.hpp) FlashAttentionV2: forward kloop: [gridwise_batched_mha_fwd_xdl_cshuffle_v2.hpp](https://github.com/ROCmSoftwarePlatform/composable_kernel/blob/mha-train-develop/include/ck/tensor_operation/gpu/grid/gridwise_batched_mha_fwd_xdl_cshuffle_v2.hpp) backward qloop from bottom to top prototype1: [gridwise_batched_mha_bwd_xdl_cshuffle_qloop_b2t_v1.hpp](https://github.com/ROCmSoftwarePlatform/composable_kernel/blob/mha-train-develop/include/ck/tensor_operation/gpu/grid/gridwise_batched_mha_bwd_xdl_cshuffle_qloop_b2t_v1.hpp) backward qloop from bottom to top...

danyao12

Complex Contraction Bilinear Gtest

2

Gtest is implemented for Complex Contraction Bilinear. as far as I see, it is passing two test suite those are test_complex_contraction_bilinear and test_complex_contraction_bilinear_interface. ./test_complex_contraction_bilinear ![image](https://github.com/ROCmSoftwarePlatform/composable_kernel/assets/3836908/1bec594b-e65b-488c-b0de-d85ac7c6b781) ./test_complex_contraction_bilinear_interface ![image](https://github.com/ROCmSoftwarePlatform/composable_kernel/assets/3836908/e2225368-7671-4daa-aaa7-a31826803ea1)

ozturkosu

[DONT MERGE] Fp4 experiment

1

@zjing14 This is a fully correct pipeline that support packed fp4 (2 `int4`s in a byte). This is used for demonstrate what might need to be changed to support subtype...

cloudhan

composable_kernel
composable_kernel copied to clipboard

Metadata

GEMM specification not found in CK?

Compilation error for navi10 (use of undeclared identifier 'CK_BUFFER_RESOURCE_3RD_DWORD')

DTYPES : some newly created instances are not well protected

Complex Contraction CK Scale Example with test

Complex Contraction CK Bilinear Example

Kernarg latency opt

Tall and skinny GEMM & GEMV files are added under examples and ckprofiler

FlashAttention train kernels

Complex Contraction Bilinear Gtest

[DONT MERGE] Fp4 experiment

← Metadata

Owner

Metadata

composable_kernel composable_kernel copied to clipboard

Metadata

← Metadata

Owner

Metadata

composable_kernel
composable_kernel copied to clipboard