Benoit Jacob
Benoit Jacob
@rsuderman this might be for you :-) What `--mlir-print-ir-after-all` shows for the testcase in the previous comment: ```mlir // -----// IR Dump After CSE (cse) //----- // module { func.func...
@rsuderman , here is what the equivalent f32->bf16 truncation code does in the runtime (actually it is generic in bit-widths, but it in particular does f32->bf16) specifically to fix-up in...
@rsuderman , here is the much more concise and optimized way that the PyTorch runtime does it (I think that part was written by Marat and carried over from XNNPACK...
It's ok, I think I have the patch ready soon.
@Shukla-Gaurav , this seems to work. I'll fix up any unit test that fails and send that for review @rsuderman . https://github.com/llvm/llvm-project/pull/83180
https://github.com/llvm/llvm-project/pull/83180 is merged, so you'll get it in the next integrate or can cherry-pick it locally until then to verify it fixed your issue.
> FP8 is E4M3 (inference-focused) while BF8 is E5M2 (training-focused): https://www.amd.com/en/products/accelerators/instinct/mi300/mi300a.html. If a model is trained with 2-bit mantissas (E5M2), how is the 3rd bit of mantissa in E4M3 going...
We had taken a [look](https://docs.google.com/document/d/1pTl4clEaMHj5n4P0oc5zBckHsK1et4zSARVBPNsQ__M/edit?usp=sharing) at WebAsm SIMD for NN inference here. The relevance to the present issue is that as there are multiple other issues preventing the WebAsm SIMD...
FYI @AmosLewis this is the reason why https://github.com/llvm/torch-mlir/pull/3013 was ultimately dropped from the integrate #17330.
> Will you start a new PR to bump it next? I don't plan to do it myself. We have an [integration rotation schedule](https://docs.google.com/spreadsheets/d/17PmDuRWLxJaz1eMi4Lt_qIDjKV8NtBxP3JWox4OFeps/edit#gid=0) and the integrates of this week...