composable_kernel icon indicating copy to clipboard operation
composable_kernel copied to clipboard

Composable Kernel: Performance Portable Programming Model for Machine Learning Tensor Operators

Results 151 composable_kernel issues
Sort by recently updated
recently updated
newest added

We have discovered flaky fp8/bf8 tests failing on ROCm 6.1 and newer with rounding to nearest / even. Disabling this test https://github.com/ROCm/composable_kernel/pull/1495 until compiler patch becomes available. @illsilin @junliume

Compiling test_gemm_fp64 in branch add_mfma_f64 on rocm5.1, I get error result. but compiling with rocm 9110. it can get right result. I record this issue in ticket: https://ontrack-internal.amd.com/browse/SWDEV-335738

bug

[composable_kernel/library/src/tensor_operation_instance/gpu/CMakeLists.txt at amd-develop · ROCm/composable_kernel (github.com)](https://github.com/ROCm/composable_kernel/blob/amd-develop/library/src/tensor_operation_instance/gpu/CMakeLists.txt#L67) > Do not build mha instances if gfx94 targets are not on the target list However, if one uses: >CXX=/opt/rocm/bin/amdclang++ cmake -DGPU_TARGETS="gfx1100" -DCMAKE_PREFIX_PATH=/opt/rocm -DCMAKE_BUILD_TYPE=Release...

urgency_high

For some reason, if CI passed previously for a old commit, github will allow "merge" option, even new CI of new commit is not finished.

bug

This issue tracks the issues when developing avx2 CK 1. CPU only compile. A lot of headers are included `hip_runtime.h`, and use `__device__` / `__host__` symbol to describe host/device code....

- [ ] Have a PR that fix every Device Op class Issue brought up by @j4yan https://github.com/ROCmSoftwarePlatform/composable_kernel/pull/128#discussion_r832736874 Example Fix (need to be applied for all Device Op: ``` ---...

code quality