Kunwar Grover

Results 41 issues of Kunwar Grover

The prefetch pass assumes that shared memory can be reused in the prologue. This may not be true when nested loops are involved, so we need to explicitly insert a...

This flag https://github.com/iree-org/iree/blob/d834aa7357179e0d806f3634d2efe3af2fa45171/compiler/src/iree/compiler/Codegen/LLVMGPU/KernelConfig.cpp#L90 enables software prefetching for kernels using shared memory. Software prefetching is disabled by default, and only enabled by this flag. Over time, prefetching became part of GPU...

codegen
quality of life 😊
codegen/rocm

This flag https://github.com/iree-org/iree/blob/d834aa7357179e0d806f3634d2efe3af2fa45171/compiler/plugins/target/ROCM/ROCMTarget.cpp#L93 sets a waves-per-eu attribute for llvm compilation on **every dispatch** to give the register allocator a hint. https://github.com/iree-org/iree/pull/17365 introduced a way to specify these llvm func attributes...

codegen
codegen/llvm
quality of life 😊

This flag https://github.com/iree-org/iree/blob/d834aa7357179e0d806f3634d2efe3af2fa45171/compiler/src/iree/compiler/Codegen/Common/PolynomialApproximationPass.cpp#L17 disables polynomial approximation for most math dialect operations, for hardware that supports these math operations directly. It looks like some backends rely on this flag for performance...

codegen
quality of life 😊

Depends on https://github.com/iree-org/iree/pull/18780 and https://github.com/iree-org/iree/pull/18771

Post-softmax, the range of output is between 0, 1. For low-precision types (like fp8), we scale the output range to be between 0, fpMax, so we can use more of...

transfer_gather is distributed just like transfer_read on non gathered dimensions and like vector.gather on gathered dimensions.