Benoit Jacob
Benoit Jacob
@ScottTodd , should we merge this? And then we can revisit once #15699 is fixed ?
oh yes, sorry that I didn't spot this earlier. There is a misunderstanding about what `mmt4d` does. > 1. `linalg.mmt4d` assumes that in A*B, it's the B matrix (RHS) that's...
Great analysis, thanks! Indeed, the `f32` value 6.8335e+30 has the binary encoding 0x72ac8075. Truncating this to `bf16` means replacing this by a value that is a multiple of 0x10000, so...
I don't know what code path is used in the code that you ran, but I checked this PyTorch `f32 -> bf16` rounding helper, https://github.com/pytorch/pytorch/blob/main/c10/util/BFloat16.h#L76 And it does return the...
> I used following c++ code to check 16-bits truncation result: ``` //cast to bfloat16 bfloat16& operator =(float float_val){ data = (*reinterpret_cast(&float_val))>>16; return *this; } ``` This implements the same...
Wow, funny bug that you found here! It appears to be a *parsing* bug, in how `iree-run-module` parses the `--input` flag. Indeed, it produces expected results when the specified array...
The parsing itself is correct, though - `iree_hal_parse_element_unsafe` does parse the correct value and its caller `iree_hal_parse_buffer_elements` does store it in the destination buffer. And yet, something is producing incorrect...
The bug reproduces whenever the specified `--input` element rounds to `-0.488281` as a `bfloat16` (encoding `0xbefa`). It does not reproduce whenever it rounds to the previous `bfloat16` value `-0.486328` (encoding...
And the other operand, which is hardcoded as a constant in the above testcase, also matters. Here is a testcase taking both operands as arguments: ``` #map = affine_map (0)>...
This actually minimizes down to a testcase that performs no bfloat16 arithmetic and only a f32->bfloat16 truncf: ```mlir #map = affine_map (d0)> module { func.func @main_graph(%arg0: tensor) -> tensor {...