composable_kernel icon indicating copy to clipboard operation
composable_kernel copied to clipboard

Composable Kernel: Performance Portable Programming Model for Machine Learning Tensor Operators

Results 151 composable_kernel issues
Sort by recently updated
recently updated
newest added

Added client example for bwd qloop v1, v2, light v1 and light v2. Now we can do profiling for flash attention backward qloop.

- [ ] The justification needs to be made and tracked here - [ ] There should be documentation task to update the readme, and installation guide. Reason: non-default and...

documentation
question

Hi! I'm the maintainer of ROCm community packages for [Arch Linux](https://github.com/rocm-arch/rocm-arch). Would it be possible to add tags / releases that match ROCm and MIOpen releases? This would greatly simplify...

Just found that 1. `/data/composable_kernel/include/ck/tensor_operation/gpu/device/device_cgemm.hpp` defined `GetWorkspaceSize` 2. `/data/composable_kernel/include/ck/tensor_operation/gpu/device/impl/device_cgemm_4gemm_xdl_cshuffle.hpp` implemented `GetWorkspaceSize` The correct name should be `GetWorkSpaceSize`

Under Investigation

In CK for (conv + bias + activation) we currently have activation as RELU. It would be great to have other activations. We can start with the ones that take...

enhancement