Benoit Jacob
Benoit Jacob
# Context Consider the following 4 linalg named ops and how they differ from each other, by comparing their definitions in the Linalg OpDSL (follow the links): 1. [matmul](https://github.com/llvm/llvm-project/blob/680c780a367bfe1c0cdf786250fd7f565ef6d23d/mlir/python/mlir/dialects/linalg/opdsl/ops/core_named_ops.py#L242-L256) 2....
Ukernels do not expose whether they have fast code to handle a given case. They always succeed, potentially falling back on a slow generic tile function that can be 100x...
As pointed out by @KoolJBlack , it is now necessary to pass `--iree-hal-dump-executable-sources-to=` for Tracy sampling to work. That is because Tracy sampling relies on the presence of source files,...
The proposal (PR #15586) is to change the default value of `iree-llvmcpu-enable-ukernels` from `none` to `mmt4d`. Note that the `mmt4d` ukernel is the only ukernel used at all on `llvm-cpu`...
We have `mmt4d` ukernel tile functions for a bunch of narrow-M cases, but they have been added as naive truncations of the general case. Often, that's fine. Sometimes, that results...
Narrow-N and narrow-M cases of matmuls are entirely similar. There is no need to write data-tiling logic and ukernels for all combinations of the two narrow dimensions. We have standardized...
This issue is a placeholder for future discussion about supporting 4-dimensional-reducing dot-product instructions taking 8bit inputs and accumulating into 32bit, i.e. ``` int32_accumulator += int8_lhs_0 * int8_rhs_0 + ... +...
Our benchmarks on CI run twice, without and with Tracy. This is a substantial cost and latency hit (not quite 2x as I can see that tracy is run with...
Do not submit, just for testing if https://github.com/llvm/llvm-project/pull/91800 fixes the problem it's intended to fix. If yes, then we first need to merge it upstream, then the next integrate to...
In llvm integrate #17330 we have to locally revert https://github.com/llvm/llvm-project/pull/89131 because it causes `vector.interleave` to be created instead of `vector.shuffle`, and some GPU codegen backends expected `vector.shuffle` and are not...