Benoit Jacob issues

Results 54 issues of


                                            Benoit Jacob

Fuse mmt4d ukernel with consumer and get perfect codegen thanks to store-to-load forwarding

`mmt4d`->consumer fusion is a major codegen optimization opportunity. The typical consumers include the `unpack` following the `mmt4d`, and element-wise ops (generics with only `parallel` iterators) that are typically found as...

performance ⚡

codegen/llvm

Warn when --iree-llvmcpu-target-cpu defaults to "generic".

Progress on https://github.com/iree-org/iree/issues/18561. This PR supersedes https://github.com/iree-org/iree/pull/18587. It introduces a new command line flag, `--iree-llvmcpu-logging-unspecified-target-cpu`, with three values: * Empty string (the default) preserves the current behavior of silently falling...

codegen/llvm

compiler/tools

quality of life 😊

GPU target parameters for data tiling.

This replaces some constants what were hardcoded in GPUMaterializeEncoding.cpp by actual GPU target parameters. The logic in `getSwizzle` was doing wonky things with its own local `const int targetPreferredLoadBitWidth =...

GPU data tiling: query the target's list of MMA intrinsics. Add FP8 test.

The current code had its own list of MFMA intrinsics that we can use, then checked that against the target. Flipping this around, we can simply query the list from...

Populate `max_load_instruction_bits`, `simds_per_wgp`, `vgpr_space_bits` on all GPUs

In https://github.com/iree-org/iree/pull/18839 we are introducing 3 new fields to `TargetGpuAttr`: `max_load_instruction_bits`, `simds_per_wgp`, `vgpr_space_bits` on all GPUs. For now they only are populated for CDNA3. They should be populated for other...

codegen

codegen/spirv

codegen/rocm

GPUMaterializeEncoding: expand-to-subgroups in both M and N dimensions

The current tile-selection heuristic in GPUMaterializeEncoding only ever expands to subgroups in the N dimension, never in the M dimension. That allows to keep this logic a little simpler, but...

GPUMaterializeEncoding: tune for narrow cases

The tile size selection heuristic in GPUMaterializeEncoding is focused on the generic case of non-narrow shapes; then at the end, a fix-up is applied to adjust to narrow shapes. This...

Adapt to `quant` dialect change.

https://github.com/llvm/llvm-project/pull/100667 renamed a header, so this adapts the `#include`. I need to cherry-pick this commit in IREE as we are integrating these llvm-project changes. You will need to apply this...

Finite Math Assumption makes WebGPU a difficult compilation target

This part of the spec, https://gpuweb.github.io/gpuweb/wgsl/#differences-from-ieee754 > Finite Math Assumption: > * [Overflow](https://gpuweb.github.io/gpuweb/wgsl/#ieee754-overflow), infinities, and NaNs generated before [shader execution](https://gpuweb.github.io/gpuweb/wgsl/#shader-execution-start) [will](https://gpuweb.github.io/gpuweb/wgsl/#behavioral-requirements) generate errors. Is very hard to satisfy for compilers...

wgsl

wgsl resolved

Fix torch.operator names

This implements the fix suggested by @heshuju in https://github.com/llvm/torch-mlir/issues/4108. It fixes an issue that was blocking the LLVM integrate in IREE. https://github.com/iree-org/iree/actions/runs/15626565095/job/44156258415?pr=21092