tpp-mlir icon indicating copy to clipboard operation
tpp-mlir copied to clipboard

tpp-run does not support ml_program dialect

Open nhasabni opened this issue 1 year ago • 6 comments

Tried running torch-mlir exported ResNet in linalg-on-tensor via tpp-run and found a crash. tpp-opt works fine though.

Commands (Install torch-mlir using pip)

$ python examples/torchscript_resnet18_all_output_types.py
$ tpp-opt rn18.mlir -o rn18.mlir.opt
$ tpp-run rn18.mlir.opt -e forward -entry-point-result=void

Error

$ ./tpp-run -e forward -entry-point-result=void rn18.mlir.opt
loc("rn18.mlir.opt":9:3): error: cannot be converted to LLVM IR: missing `LLVMTranslationDialectInterface` registration for dialect for op: ml_program.global
tpp-run: /nfs_home/nhasabni/other/tensor_compiler/nhasabni_tpp-sandbox/tools/tpp-run/tpp-run.cpp:199: std::unique_ptr<llvm::Module> lowerToLLVMIR(mlir::Operation *, llvm::LLVMContext &): Assertion `llvmModule' failed.
PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace.
Stack dump:

nhasabni avatar Sep 05 '23 23:09 nhasabni

ml_program seems to be a dead-end in the upstream MLIR.

The basic dialect ops are defined (that's why tpp-opt is fine with it), however, there are no conversion passes or any further integration. The dialect seems like a stub for frontend conversion but not much more. IREE has custom lowering passes for ml_program (see iree/compiler/MHLO/MHLOToLinalgOnTensors.cpp) but I see nothing relevant available upstream.

Looking at this rn18 example from torch-mlir, it seems like the one ml_program.global variable isn't used anywhere. So, I hope we can get away with some minor IR cleanup are run the rest as is.

@nhasabni do you think that's feasible or lack of ml_program lowering might become a blocker in near future?

adam-smnk avatar Sep 06 '23 08:09 adam-smnk

Could this be an upstream pass to simple bufferize ml_program.global to memref.global?

rengolin avatar Sep 06 '23 09:09 rengolin

Could this be an upstream pass to simple bufferize ml_program.global to memref.global?

I'm not familiar with ml_program use cases but probably. Or to a dense tensor as you usually enter at that abstraction level. Question is whether it should be needed at all. Maybe it's just some torch-mlir artefact/leftover.

adam-smnk avatar Sep 06 '23 09:09 adam-smnk

I see a few ways to work on this:

  1. Try to upstream some conversion/bufferization pass to ml_program. This is quick and should be uncontroversial, unless people are already preparing to kill that dialect.
  2. Work upstream (RFC on LLVM) to kill that dialect and get the other tools (RFC on torch-mlir) to stop generating it. This is slower, but if it's the path others are leading towards, it's the best outcome.
  3. If all else fails, add a local pass downstream. This is by far the worst solution, but lets us "worry about this problem" at a later date, and perhaps even use this as a PoC to what the problems really are.

I recommend we work in that order.

rengolin avatar Sep 06 '23 09:09 rengolin

Just to update this conversation:

  • I see that torch-dynamo support in torch-mlir also ends up generating MLIR for input ML models that contains ml_program. I found that ml_program usage in these MLIR files is not dead code. See attached MNIST example.
  • I also found that there are no upstream conversation patterns for ml_program. IREE seems to contain that code - https://github.com/openxla/iree/blob/9c424c4f4b0ebbba8c47543efb168cadb6e1e07c/compiler/src/iree/compiler/InputConversion/Common/ImportMLProgram.cpp#L82
  • It looks like IREE folks were pushing for this dialect and contributed to MLIR upstream - https://discourse.llvm.org/t/rfc-introduce-ml-program-dialect-and-top-level-ops-proposal-v2/60907

So bottom line is - if we want to get PyTorch2 models imported via torch-mlir to work with tpp-opt, we would need to get ml_program dialect working correctly. Currently, I am facing a problem with MNIST example attached.

$ tpp-opt -default-tpp-passes mnist_with_mlprog.mlir > /tmp/x
mnist_with_mlprog.mlir:95:11: error: op was not bufferized
    %18 = ml_program.global_load @global_seed : tensor<i64>
          ^

nhasabni avatar Oct 12 '23 18:10 nhasabni

https://github.com/llvm/llvm-project/pull/75103

rengolin avatar Jan 29 '24 09:01 rengolin