composable_kernel icon indicating copy to clipboard operation
composable_kernel copied to clipboard

Composable Kernel: Performance Portable Programming Model for Machine Learning Tensor Operators

Results 151 composable_kernel issues
Sort by recently updated
recently updated
newest added

GridwiseGemm implementations guarantees that there won't be out-of-range (padded) addresses along K0/K1 in the main loop. But it currently does not leverage that fact yet @asroy once suggested adding a...

enhancement

# major - [x] Add license - [x] use absolute path for header - [x] fix header dependency: https://github.com/ROCmSoftwarePlatform/composable_kernel/issues/170 - [ ] Tensor operation naming: grid/block/warp/thread-level operation, conv/gemm/reduce/elementwise operation -...

code quality

_Originally posted by @j4yan in https://github.com/ROCmSoftwarePlatform/composable_kernel/pull/128#discussion_r832741425_ # IdentityValue This value is a mathematical property of reduction type, and should be deterministic and not specified by user of reduction https://docs.oracle.com/javase/tutorial/collections/streams/reduction.html #...

code quality

Have add_example_test() separated from add_example_executable() enables more testing command-lines being added for each single example. And having the add_example_test() enables the developer to use one example .cpp and executable to...

The CI is conservative in reusing build artifacts from last stages. Say benchmarking in 4-th stage can reuse the device instances library from 2nd or 3rd stage, but CI simply...

MakeArgument() can assert upon incompatible problem sizes and triggers full program exit before profiler can finish benchmarking. Since we already have mechanism for checking problem compatibility in IsSupportedArgument() without triggering...

code quality

when compiling branch fp16_transfer_to_bf16 all tests, the compiler give some error info. `fatal error: error in backend: SmallVector unable to grow. Requested capacity (4294967296) is larger than maximum value for...

bug

Our kernel sees partial 2-way bank conflict for K-contiguous matrices, and full 2-way/4-way conflict for MN-contiguous matrices (dependent on tile sizes). Profiling has shown partial 2-way bank conflict for K-contiguous...

Performance Issue

We only support 2D (DeviceBinaryElementWise2D) so far. Support more dimension for flexibility

code quality