composable_kernel
composable_kernel copied to clipboard
Composable Kernel: Performance Portable Programming Model for Machine Learning Tensor Operators
As a follow-up of this PR: https://github.com/ROCm/composable_kernel/pull/1228 we need to add: - [ ] unit-tests covering all defined instances.
Is it difficult for me to achieve the fusion of int4 inverse quantization and gem using the existing template? What suggestions do you have?
I referred to the layout of other B to modify the parameters of the row template. The compilation was good, but the calculation result was incorrect.When retrieving multiple data at...
### Problem Description I get much higher performance than I do with ckProfiler. Is this just some sort of CU-level unit test? ### Operating System Ubuntu 22.04 ### CPU AMD...
CK_BUILD_JIT_LIB seems a required option for building migraphx (shown below). However, the variable seems removed in your main library. Must I switch to a branch of your library for building...
I have an example implementation of a block-sparse attention kernel that builds on top of the existing fmha_fwd example. Current performance sees roughly a 2.22x speedup at a 4K sequence...
**Problem:** 1. Wrong results when running example_gemm_xdl_fp16. 2. On one MI250 GPU, I got only ~110 TFlops using the default GEMM problem size, which is lower than expected. Does this...
### Problem Description I currently am unable to compile for debugging with `-O0`. When I add that flag, I get an error "error: Illegal instruction detected: Operand has incorrect register...
### Problem Description Compilation error in debug on Visual Studio compiler ``` In file included from D:/workspace/oidn/devices/hip/../../external/composable_kernel/include\ck/utility/common_header.hpp:46: D:/workspace/oidn/devices/hip/../../external/composable_kernel/include\ck/utility/amd_inline_asm.hpp:212:21: error: instruction not supported on this GPU asm volatile("\n \ ^ :2:14:...