composable_kernel icon indicating copy to clipboard operation
composable_kernel copied to clipboard

Composable Kernel: Performance Portable Programming Model for Machine Learning Tensor Operators

Results 276 composable_kernel issues
Sort by recently updated
recently updated
newest added

## Proposed changes Added an example of bf16 * FP4 GEMM with bias and SwiGLU activation. In this implementation, both FP4 weights and FP4 scaling factors are stored in uint8...

## Proposed changes Add unit test for fp4 warp gemm. ## Checklist Please put an `x` into the boxes that apply. You can also fill these out after creating the...

## Proposed changes ck fa bwd supports atomic b16 for block_fmha_bwd_dq_dk_dv_pipeline_kr_ktr_vr_iglp and block_fmha_bwd_dq_dk_dv_pipeline_kr_ktr_vr pipeline. ## Checklist Please put an `x` into the boxes that apply. You can also fill these...

Stale

## Proposed changes These changes implement SplitK support for `device_grouped_conv_fwd_multiple_abd_xdl_cshuffle_v3`. The implementation supports both one-stage and two-stage execution based on data type. Three execution paths are available: - **Two-stage with...

### Problem Description /home/richard/data/rocm-all-libs-build/rocm-libraries-build/rocm-libraries/projects/composablekernel/library/include/ck/library/tensor_operation_instance/add_device_operation_instance.hpp:17:54: error: no member named 'unique_ptr' in namespace 'std' 17 | void add_device_operation_instances(std::vector& op_instances, | ~~~~~^ /home/richard/data/rocm-all-libs-build/rocm-libraries-build/rocm-libraries/projects/composablekernel/library/include/ck/library/tensor_operation_instance/add_device_operation_instance.hpp:17:65: error: 'BaseOp' does not refer to a value 17 |...

status: triage

## Proposed changes Please describe the motivation behind the pull request, whether it enables a new feature or fixes a bug. If there are associated pull requests or issues, please...

## Proposed changes Please describe the motivation behind the pull request, whether it enables a new feature or fixes a bug. If there are associated pull requests or issues, please...

## Proposed changes I add a simple case to show how to use async_load_tile in ck_tile. ## Checklist Please put an `x` into the boxes that apply. You can also...

- Add CMake documentation infrastructure with auto Python venv management - Enable streamlined docs build: cmake --build . --target docs ## Proposed changes Please describe the motivation behind the pull...

## Proposed changes The EpilogueChainer allows to stack epilogues one after the other and run them sequentially. The EpilogueChainer implementation is demonstrated using the cshuffle epilogue by breaking and chaining...