composable_kernel icon indicating copy to clipboard operation
composable_kernel copied to clipboard

Composable Kernel: Performance Portable Programming Model for Machine Learning Tensor Operators

Results 151 composable_kernel issues
Sort by recently updated
recently updated
newest added

Scaling values (eg. `alpha` and `beta`) and float constants (eg. `epsilon`) are used by our kernels and passed from the user through the Device Op API. The following is a...

While adding new type of device operator instances, we also have to add corresponding `add_device_xxxx_instances()` declarations in the header. It's error-prone and time consuming. ```c++ // file: library/include/ck/library/tensor_operation_instance/gpu/gemm.hpp namespace ck...

The `ck::Array` and `std::array` behave same. And the only difference between those two types is that former has templated assignment operator. I think `ck::Array` can be used in most use...

There are lots of duplicated codes in implementations, like the `HostTensorDescriptor` creation logic. ```c++ auto f_host_tensor_descriptor1d = [](std::size_t len, std::size_t stride) { return HostTensorDescriptor({len}, {stride}); }; auto f_host_tensor_descriptor2d = [](std::size_t...

Currently we put headers into _include/**ck/xxxxx**_ sub-directories except _ckProfiler_ ```console $ tree library/include/ -L 3 library/include/ └── ck └── library ├── reference_tensor_operation ├── tensor_operation_instance └── utility $ tree profiler/include/ -L...

For the targets like _ckProfiler_, I found that existing source files and the `add_executable()` arguments are identical. We can see same symptom in the instance libraries: - First argument of...

We do not have grouped att for now, @rosenrodt . @asroy Do we need instances for group bmm+softmax+gemm+permute _Originally posted by @shaojiewang in https://github.com/ROCmSoftwarePlatform/composable_kernel/issues/425#issuecomment-1252494368_

Some test cases fail due to mismatch between tensor's rank and accessor's rank. The trick here is to not specify CMAKE_BUILD_TYPE so the resulting compiler flag will not include -DNDEBUG....

- [ ] https://github.com/ROCmSoftwarePlatform/composable_kernel/pull/395#discussion_r964097258 - [ ] https://github.com/ROCmSoftwarePlatform/composable_kernel/pull/395#discussion_r964097985 - [ ] https://github.com/ROCmSoftwarePlatform/composable_kernel/pull/395#discussion_r964098118 - [ ] https://github.com/ROCmSoftwarePlatform/composable_kernel/pull/399#discussion_r964049341

code quality

In fused attention kernel implementation, we were met with a "M0_K_M1 reduce K" problem that the original `PartitionedBlockwiseReduction` does not quite capture. To that end we had introduced an [ad-hoc...