Problem: We don't test onnx ops in our CI

sometimes bad onnx lowering slip past code review
- SHARK-TestSuite catches some of them, but it's easy to improperly write test cases that don't actually run
- onnx lowerings regress and get missed because SHARK-TestSuite isn't run as a part of the CI
- While IREE, a downstream project, does test for onnx nodes numerically, sometimes it's hard to tell whether an onnx failure is caused by something in IREE or torch-mlir

For example, we have these Onnx Ops that have made their way into torch-mlir but ultimately don't run in IREE's test suites:

LSTM (i wrote this and thought this worked based on SHARK-TestSuite!)
STFT
HardMax (and many, many more!)

If we have some onnx node tests in torch-mlir CI:

if an op works, we know that it's a downstream problem with IREE
if an op doesn't work, we know exactly why because the error messages and the failures will be right there inside the CI
if an op regresses, we know exactly who & what is responsibl

Problems with existing solutions

our existing test-suite

We have an existing test-suite in projects/pt1 that imports a lot of pytorch ops, and performs numerical comparison with native pytorch via a variety of paths, including ONNX.

There are two main problems with these:

some Onnx ops don't have pytorch analogues and cannot be tested here
with ops that represent layers & carry weights, the existing testing infrastructure generates weights separately between pytorch and torch-mlir, causing the test cases to always fail numerically.

testing downstream in IREE

Our downstream project IREE does run good torch node tests, but reports many of the onnx ops that we've lowered as failing. I haven't found a way to view the error messages, and it's also hard to tell whether these failures are due to IREE or torch-mlir.

Proposed solution:

We should add a CI script and some testing scripts to torch-mlir that:

downloads models and test inputs and reference outputs from the official ONNX op test-suite.
- @scotttodd has it converted to mlir and stored in SHARK-TestSuite here
run these test cases
report on CI

Jul 03 '24 17:07 renxida

https://github.com/nod-ai/onnxruntime/tree/iree_ep/onnxruntime/core/providers/iree

Maybe we can use onnxruntime to directly plug into onnx's tests and not have to write additional data / model preprocessing scripts.

Jul 03 '24 18:07 renxida

Our downstream project IREE does run good torch node tests, but reports many of the onnx ops that we've lowered as failing. I haven't found a way to view the error messages, and it's also hard to tell whether these failures are due to IREE or torch-mlir.

I archived some historical logs here:

https://gist.github.com/ScottTodd/1a02531cc76a3b8566428207e39d1870
https://gist.github.com/ScottTodd/ecc9c57c01bfc5e996a15cdd38df6a9c

At the time I decided that the full output would be too noisy to include on all CI runs. The list of failures may be small enough now to revise that decision. Generally, you can run pytest with -rA (https://docs.pytest.org/en/stable/how-to/output.html) to see output from XFAIL'd tests, or run with --ignore-xfails (see other custom flags in the conftest.py file).

Jul 03 '24 18:07 ScottTodd

@rsuderman got some references I could see on how to run torch-mlir and get numerical results w/o using IREE?

Jul 03 '24 20:07 renxida

We had some good experience with the onnx.reference evaluator where onnxruntime would lack support for some ops or dtypes (e.g. bfloat16).

Jul 08 '24 15:07 mgehre-amd

@renxida Hi! When you say that these ops fail, do you expect them to have linalg lowerings?

Jul 30 '24 12:07 vinayakdsci

@vinayakdsci yup! I'm expecting them to work e2e.

In an ideal world, instead of pushing many ops through layer by layer, then coming back to try to push them through the next layer while trying to remember how our old implementations work, I'd like us to push each op through the whole way before moving on to the next thing.

Jul 30 '24 16:07 renxida

@renxida I agree :) But I just wanted to point this out that many ops could be failing because of missing torch to linalg lowerings. And don't worry, I am sure we will be able to push them through!

Jul 30 '24 17:07 vinayakdsci

Need E2E ONNX op tests in CI

Problem: We don't test onnx ops in our CI

Problems with existing solutions

Proposed solution: