Benoit Jacob comments

Results 119 comments of


                                            Benoit Jacob

Update tracy docs regarding `--iree-hal-dump-executable-sources-to=`

@ScottTodd , should we merge this? And then we can revisit once #15699 is fixed ?

[CPU] Enable DT for SVE for linalg.matmul

oh yes, sorry that I didn't spot this earlier. There is a misunderstanding about what `mmt4d` does. > 1. `linalg.mmt4d` assumes that in A*B, it's the B matrix (RHS) that's...

bf16 result mismatch for Conv2D op

Great analysis, thanks! Indeed, the `f32` value 6.8335e+30 has the binary encoding 0x72ac8075. Truncating this to `bf16` means replacing this by a value that is a multiple of 0x10000, so...

bf16 result mismatch for Conv2D op

I don't know what code path is used in the code that you ran, but I checked this PyTorch `f32 -> bf16` rounding helper, https://github.com/pytorch/pytorch/blob/main/c10/util/BFloat16.h#L76 And it does return the...

bf16 result mismatch for Conv2D op

> I used following c++ code to check 16-bits truncation result: ``` //cast to bfloat16 bfloat16& operator =(float float_val){ data = (*reinterpret_cast(&float_val))>>16; return *this; } ``` This implements the same...

bf16 result mismatch for Conv2D op

Wow, funny bug that you found here! It appears to be a *parsing* bug, in how `iree-run-module` parses the `--input` flag. Indeed, it produces expected results when the specified array...

bf16 result mismatch for Conv2D op

The parsing itself is correct, though - `iree_hal_parse_element_unsafe` does parse the correct value and its caller `iree_hal_parse_buffer_elements` does store it in the destination buffer. And yet, something is producing incorrect...

bf16 result mismatch for Conv2D op

The bug reproduces whenever the specified `--input` element rounds to `-0.488281` as a `bfloat16` (encoding `0xbefa`). It does not reproduce whenever it rounds to the previous `bfloat16` value `-0.486328` (encoding...

bf16 result mismatch for Conv2D op

And the other operand, which is hardcoded as a constant in the above testcase, also matters. Here is a testcase taking both operands as arguments: ``` #map = affine_map (0)>...

bf16 result mismatch for Conv2D op

This actually minimizes down to a testcase that performs no bfloat16 arithmetic and only a f32->bfloat16 truncf: ```mlir #map = affine_map (d0)> module { func.func @main_graph(%arg0: tensor) -> tensor {...