onnx-mlir icon indicating copy to clipboard operation
onnx-mlir copied to clipboard

Default make need onnx to be installed

Open AlexandreEichenberger opened this issue 3 years ago • 8 comments

Our default build listed in the README.txt (https://github.com/onnx/onnx-mlir#installation-on-unix) is as follows:

cmake --build .

and this attempts to build stuff for running the check-onnx-backend, which requires onnx. To the best of my understanding, building onnx requires the procedure listed here (https://github.com/onnx/onnx-mlir#installing-third_party-onnx-for-backend-tests-or-rebuilding-onnx-operations).

I don't understand why our CIs don't appear to install onnx from our third_party submodule.

From our discussions today, our default make (listed in the README instructions) should build whatever is needed for an end to end test, namely onnx-mlir. Building also onnx-mlir-opt and running the simple lit tests are probably fine (I am for it)

AlexandreEichenberger avatar Dec 13 '21 22:12 AlexandreEichenberger

The CI does install third_party/onnx. You can see in Dockerfile.onnx-mlir and Dockerfile.onnx-mlir-dev:

RUN ONNX_ROOT=${WORK_DIR}/onnx-mlir/third_party/onnx \
    && cd ${ONNX_ROOT} \
    && python3 setup.py -q install \
    && rm -rf .eggs .setuptools-cmake-build build dist onnx.egg-info

And the default make does produce all the binaries such as onnx-mlir and onnx-mlir-opt.

gongsu832 avatar Dec 14 '21 03:12 gongsu832

@gongsu832 , thanks, that explain the mystery (was grepping for pip install).

On the second point, could I ask you to look into separating the default make (gen of onnx-mlir and onnx-mlir-opt + lit tests that work without onnx) for our default users, plus a more complete target for our CIs and developers? That would be great. If that make could also install onnx, that would be the cherry on top... continuing as is with a bit of explanation in the "developer" section is ok too.

AlexandreEichenberger avatar Dec 14 '21 15:12 AlexandreEichenberger

@AlexandreEichenberger A seemingly simple thing like "install onnx" can turn into a nightmare simply because all the dependencies involved and there are unlimited number of host configurations that this can go run. There is simply no way to write some simple target or script and expect it to work in all these configurations. That's why we have the onnx-mlir-dev image for doing dev work.

And on the issue of the build system, I really don't quite understand why we keep trying to run tests in the default make target. Even MLIR doesn't do that. I understand the desire to make the system as easy to use as possible. But let's not burden the base build system for that. The base build system is for building onnx-mlir, not for teaching you how to use onnx-mlir. You can do that with scripts and docs.

gongsu832 avatar Dec 15 '21 05:12 gongsu832

I am following the readme to build this project (cmake --build .) on ubuntu and it is generating the below error:

FAILED: docs/doc_example/add.so /home/demo/onnx-mlir/build/docs/doc_example/add.so
cd /home/demo/onnx-mlir/build/docs/doc_example && /usr/bin/python3.8 /home/demo/onnx-mlir/build/docs/doc_example/gen_add_onnx.py && /home/demo/onnx-mlir/build/Debug/bin/onnx-mlir add.onnx
Traceback (most recent call last):
  File "/home/demo/onnx-mlir/build/docs/doc_example/gen_add_onnx.py", line 1, in <module>
    import onnx
ModuleNotFoundError: No module named 'onnx'

Came across this thread while looking for a solution for this error.

Not sure how to proceed. Any help is greatly appreciated.

KrishnaPG avatar Dec 30 '21 16:12 KrishnaPG

A simple work around is to go in the build directory and then type make onnx-mlir or make onnx-mlir-opt, use ninja if that is what you use. Unfortunately, the current default make builds way too much.

AlexandreEichenberger avatar Jan 04 '22 22:01 AlexandreEichenberger

@AlexandreEichenberger A seemingly simple thing like "install onnx" can turn into a nightmare simply because all the dependencies involved and there are unlimited number of host configurations that this can go run. There is simply no way to write some simple target or script and expect it to work in all these configurations. That's why we have the onnx-mlir-dev image for doing dev work.

And on the issue of the build system, I really don't quite understand why we keep trying to run tests in the default make target. Even MLIR doesn't do that. I understand the desire to make the system as easy to use as possible. But let's not burden the base build system for that. The base build system is for building onnx-mlir, not for teaching you how to use onnx-mlir. You can do that with scripts and docs.

I would like to see the ALL target for onnx-mlir now execute any tests - and all the test building be configurable via the ONNX_MLIR_BUILD_TESTS. This makes the most sense since it would not require any of the fragile steps (such as installing onnx) and it will match the behavior for llvm and mlir. Right now ONNX_MLIR_BUILD_TESTS only works on some (one) tests, but we could apply it to all of them.

I agree with @gongsu832 that being able to call make or ninja on the project to get all the binaries but no tests makes the most sense. Then we can put the tests behind a check-all target similar to llvm, so that we can execute all tests cleanly with a single target. This would be in addition to all the existing test targets.

sstamenova avatar Jan 08 '22 03:01 sstamenova

That make sense.

  • [ ] All test use ONNX_BUILD_TESTS: how far are we on this?
  • [x] Make build the binaries but not test them. I tested this; typing a make does not invoke any tests as far as I can tell.
  • [ ] Test everything using make check-all does not appear to be implemented.

AlexandreEichenberger avatar Jan 11 '22 15:01 AlexandreEichenberger

ONNX_BUILD_TESTS: this would be fairly easy to implement since the tests it would apply to are probably just the numerical tests at this point. There may be some others, but it shouldn't be terribly difficult.

check-all: this is a little bit more complicated to implement. I took a first pass a while back and the issue I ran into is that there's no single reporting mechanism for the tests, so the results are not merged together (i.e. the lit tests report their failures collectively, the ctests report their failures collectively, etc. but there's not a single summary). It is fairly trivial to add a target that would run all the tests, but the reporting would be suboptimal.

sstamenova avatar Jan 11 '22 18:01 sstamenova