Eugene Zhulenev
Eugene Zhulenev
Dynamic offsets in `DefaultEpilogue` allows to move pointer arithmetics to device and shift `C` and `D` pointers based on offsets stored in device memory. Depends on https://github.com/NVIDIA/cutlass/pull/1273
Clang built from source: https://clang.llvm.org/get_started.html ``` ../llvm-project/build/bin/clang -v clang version 18.0.0git (https://github.com/llvm/llvm-project.git a855b2c894444419c3689aff6fd0381fdeb02491) ``` main.cpp ``` #include #include "cutlass/epilogue/collective/collective_builder.hpp" int main() { cutlass::half_t x = 2.25_hf; std::cout
### Request description Triton heavily relies on run time auto-tuning to select the best kernel at runtime, tunable parameters are typically tile/block size, and they also impact the grid size....