composable_kernel
composable_kernel copied to clipboard
Composable Kernel: Performance Portable Programming Model for Machine Learning Tensor Operators
Added client example for bwd qloop v1, v2, light v1 and light v2. Now we can do profiling for flash attention backward qloop.
- [ ] The justification needs to be made and tracked here - [ ] There should be documentation task to update the readme, and installation guide. Reason: non-default and...
Hi! I'm the maintainer of ROCm community packages for [Arch Linux](https://github.com/rocm-arch/rocm-arch). Would it be possible to add tags / releases that match ROCm and MIOpen releases? This would greatly simplify...
Just found that 1. `/data/composable_kernel/include/ck/tensor_operation/gpu/device/device_cgemm.hpp` defined `GetWorkspaceSize` 2. `/data/composable_kernel/include/ck/tensor_operation/gpu/device/impl/device_cgemm_4gemm_xdl_cshuffle.hpp` implemented `GetWorkspaceSize` The correct name should be `GetWorkSpaceSize`
In CK for (conv + bias + activation) we currently have activation as RELU. It would be great to have other activations. We can start with the ones that take...