onnx-mlir
onnx-mlir copied to clipboard
Initialize conversion passes from ONNX to Torch-MLIR backend contract
Includes LLVM lit tests for ONNXAddOp and ONNXConstantOp. This PR is a result of the RFC discussed in #1639. The following overview of the PR is collected from there, but see the RFC for a more detailed discussion. This includes four passes in the conversion pipeline:
- "convert-onnx-to-torch" handles the op by op lowering to torch dialect
- "convert-function-types-to-torch-types" converts function arguments to torch types (e.g. torch.vtensor) that wasn't converted in the previous pass
- "finalize-torch-type-conversion" finalizes the conversion to torch types and appropriately removes any UnrealizedCastConversionOp's
- "erase-onnx-entry-point" right now this entry point op gets removed for compatibility with torch-mlir.
As requested there is an option to disable the torch-mlir build with ONNX_MLIR_ENABLE_TORCH
. One point of awkwardness here though comes from torch-mlir including llvm-project as a submodule. Thus when adding torch-mlir to onnx-mlir as a submodule, cloning recursively will clone the entirety of llvm as well (it doesn't build it as we are only building the parts of torch-mlir that we need). Suggestions on how to manage the submodules in this case, as well as any comments on if/how to break up this PR are welcome.
If there is anything in particular needed to help with reviewing here please let me know.
Can one of the admins verify this patch?
Thank you for the great work! You could probably configure a shallow submodule clone.
git config -f .gitmodules submodule.third_party/torch-mlir.shallow true
Thank you for the great work! You could probably configure a shallow submodule clone.
Hmmm as far as I can tell a shallow submodule won't prevent the full llvm-project submodule in Torch-MLIR from being cloned when cloning with
git clone --recursive https://github.com/onnx/onnx-mlir.git
Correct me if I'm wrong but it didn't work when I tried your suggestion. I opened an issue in Torch-MLIR about formal support for the use case here (https://github.com/llvm/torch-mlir/issues/1411).
@jenkins-droid test this please
It looks like CMake on Windows isn't able to find the Torch-MLIR external dialects (TMTensor). I'll have to set up on Windows to iron out the CI problems.
Can one of the admins verify this patch?
I can convert this PR to a draft until I verify that the Windows build works, but style comments/reviews are still welcome at this point.
You need to abide by DCO. See Contributing section of main Readme.md. Every commits need to be signed. It can be overridden on this page (details next to DCO) but this should be used only exceptionally.
@jenkins-droid test this please
Can one of the admins verify this patch?
@jenkins-droid test this please
I pushed a new change that changes to a temporary fork of Torch-MLIR that will be used for iterating this PR without needing to create PRs in Torch-MLIR for each small change. Once the reviewing is done here we can switch back to the main branch of Torch-MLIR. With this I was able to build successfully in the Docker image used by the CI with all unit tests passing, but I haven't tested locally on all platforms.
With this, if there are more CI failures, especially as this PR is reviewed, I am wondering if there is a way to set up so that running the CI doesn't require admin approval each time to allow for faster iteration (perhaps on a temporary branch within ONNX-MLIR). @AlexandreEichenberger if you or one of the other admins could help/comment here that would be greatly appreciated! If this isn't possible then no problem, but I figured I would ask.
Can one of the admins verify this patch?
@jenkins-droid test this please
Can one of the admins verify this patch?
@jenkins-droid test this please
@jenkins-droid test this please