tpp-mlir
tpp-mlir copied to clipboard
Remaining Issues for MLP performance on par with libxsmm-dnn
These are the known issues to reach libxsmm-dnn performance on "pre-packed layer" MLPs:
- [x] Beta=Zero (see #777, #784)
- [x] XSMM fusion (see #752)
- [ ] Allocation on page boundary (2MB)?
- [ ] Change loop order with flags?
In theory, if we get all of those in, it should reach parity. If more is discovered, please add to the list. Let's only close this issue when we reach parity on the base MLP benchmarks we have for pre-packed MLPs.
@chelini @alheinecke
Beta=0 is done, benchmark IR is affected, but we got <1% performance change from that, probably within noise. We didn't expect a huge change, so not a big deal.