composable_kernel
composable_kernel copied to clipboard
Composable Kernel: Performance Portable Programming Model for Machine Learning Tensor Operators
## Proposed changes Added an example of bf16 * FP4 GEMM with bias and SwiGLU activation. In this implementation, both FP4 weights and FP4 scaling factors are stored in uint8...
## Proposed changes Add unit test for fp4 warp gemm. ## Checklist Please put an `x` into the boxes that apply. You can also fill these out after creating the...
## Proposed changes ck fa bwd supports atomic b16 for block_fmha_bwd_dq_dk_dv_pipeline_kr_ktr_vr_iglp and block_fmha_bwd_dq_dk_dv_pipeline_kr_ktr_vr pipeline. ## Checklist Please put an `x` into the boxes that apply. You can also fill these...
## Proposed changes These changes implement SplitK support for `device_grouped_conv_fwd_multiple_abd_xdl_cshuffle_v3`. The implementation supports both one-stage and two-stage execution based on data type. Three execution paths are available: - **Two-stage with...
### Problem Description /home/richard/data/rocm-all-libs-build/rocm-libraries-build/rocm-libraries/projects/composablekernel/library/include/ck/library/tensor_operation_instance/add_device_operation_instance.hpp:17:54: error: no member named 'unique_ptr' in namespace 'std' 17 | void add_device_operation_instances(std::vector& op_instances, | ~~~~~^ /home/richard/data/rocm-all-libs-build/rocm-libraries-build/rocm-libraries/projects/composablekernel/library/include/ck/library/tensor_operation_instance/add_device_operation_instance.hpp:17:65: error: 'BaseOp' does not refer to a value 17 |...
## Proposed changes Please describe the motivation behind the pull request, whether it enables a new feature or fixes a bug. If there are associated pull requests or issues, please...
## Proposed changes Please describe the motivation behind the pull request, whether it enables a new feature or fixes a bug. If there are associated pull requests or issues, please...
## Proposed changes I add a simple case to show how to use async_load_tile in ck_tile. ## Checklist Please put an `x` into the boxes that apply. You can also...
- Add CMake documentation infrastructure with auto Python venv management - Enable streamlined docs build: cmake --build . --target docs ## Proposed changes Please describe the motivation behind the pull...
## Proposed changes The EpilogueChainer allows to stack epilogues one after the other and run them sequentially. The EpilogueChainer implementation is demonstrated using the cshuffle epilogue by breaking and chaining...