onnxruntime icon indicating copy to clipboard operation
onnxruntime copied to clipboard

Allow link to system-installed onnx.

Open xkszltl opened this issue 2 years ago • 13 comments

When using shared protobuf, both onnx and ort register to the same place. This patch ensures both sides go through the same proto .so and only load once.

xkszltl avatar Aug 03 '22 12:08 xkszltl

This is a new feature request. Please let our PMs know why it is needed. I can't make any move before having a formal discussion with the PMs and dev managers.

snnn avatar Sep 07 '22 04:09 snnn

Thanks for the update.

Please let our PMs know why it is needed.

@snnn Could you help driving the discussion and clarify what kind of "why" not covered by the current PR description is missing? You may include the PM in this thread.

xkszltl avatar Sep 07 '22 05:09 xkszltl

The background here is if we load onnx+ort, especially the one built with shared pb, together into memory, they'll conflict because both carry onnx schemas under the same name and reg to the same registry. By link to the shared system onnx, it'll only load once by loader to resolve symbols without duplication.

xkszltl avatar Sep 07 '22 05:09 xkszltl

/azp run Windows CPU CI Pipeline,Windows GPU CI Pipeline,Windows GPU TensorRT CI Pipeline,ONNX Runtime Web CI Pipeline,Linux CPU CI Pipeline,Linux CPU Minimal Build E2E CI Pipeline,Linux GPU CI Pipeline,Linux GPU TensorRT CI Pipeline,Linux Nuphar CI Pipeline,Linux OpenVINO CI Pipeline

skottmckay avatar Sep 08 '22 23:09 skottmckay

/azp run MacOS CI Pipeline,orttraining-amd-gpu-ci-pipeline,orttraining-linux-ci-pipeline,orttraining-linux-gpu-ci-pipeline,orttraining-ortmodule-distributed,onnxruntime-python-checks-ci-pipeline,onnxruntime-binary-size-checks-ci-pipeline

skottmckay avatar Sep 08 '22 23:09 skottmckay

Azure Pipelines successfully started running 7 pipeline(s).

azure-pipelines[bot] avatar Sep 08 '22 23:09 azure-pipelines[bot]

Azure Pipelines successfully started running 9 pipeline(s).

azure-pipelines[bot] avatar Sep 08 '22 23:09 azure-pipelines[bot]

@faxu, any update please?

wschin avatar Sep 15 '22 17:09 wschin

Hi @xkszltl and thank you for your suggested contribution. In order to keep the codebase manageable, we request that these 4 criteria are met:

  1. Confirm that this change does not impact the regular/standard workflow.
  2. Confirm that this change does not make the code unnecessarily complex and unmaintainable to solve, given that this change is only for one use case.
  3. Ensure that here are tests to validate the changes (wherever possible). Please elaborate on how we can test this to ensure it's correctly working, and whether it's possible to automate this test.
  4. Ensure that the feature is documented so that it's clear how others can benefit from this as well. Please add this to your PR through code annotation.

faxu avatar Sep 16 '22 20:09 faxu

  1. Depends on the definition of "standard", could you clarify? More specific, this changes the onnx used when -D onnxruntime_PREFER_SYSTEM_LIB=ON, because ort currently doesn't have finer-grain switches. But is onnxruntime_PREFER_SYSTEM_LIB used by the "standard workflow"?
  2. It doesn't, there's only 15 lines of non-whitespace code, to add a extra branch. And it's more of a bugfix than a feature, so not gonna be "unnecessarily". Actually I'm seen multiple requests on the related changes when using shared protobuf.
  3. At this point it's tested in our side. I would certainly love to have CI in ort as well but it depends on custom build of onnx, which I don't think ort has in CI at this point? Any suggestions on that?
  4. Will do. Regarding "code annotation", you mean comment in code or on github?

xkszltl avatar Sep 16 '22 22:09 xkszltl

@snnn / @pranavsharma can you review comments 1 and 3?

For 4 - describing this as a comment in the code should be ok for a paper trail, unless you have another suggestion for a good place to document this and how to use.

faxu avatar Sep 16 '22 23:09 faxu

  1. Depends on the definition of "standard", could you clarify? More specific, this changes the onnx used when -D onnxruntime_PREFER_SYSTEM_LIB=ON, because ort currently doesn't have finer-grain switches. But is onnxruntime_PREFER_SYSTEM_LIB used by the "standard workflow"?

We don't use onnxruntime_PREFER_SYSTEM_LIB in the regular workflow.

  1. At this point it's tested in our side. I would certainly love to have CI in ort as well but it depends on custom build of onnx, which I don't think ort has in CI at this point? Any suggestions on that?

We won't have the resources to create a CI on our side.

pranavsharma avatar Sep 16 '22 23:09 pranavsharma

We regularly need to refactor our code. So, when we make changes, how do we verify if this still works? If you could give us some instructions, it would be very helpful.

snnn avatar Sep 17 '22 02:09 snnn

The workflow would be to compile protobuf as shared lib, and compile onnx to use that. These are the script we used for that:

  • https://github.com/xkszltl/Roaster/blob/b6e950c0a67eac014a1dcff8f084ce71e03f6364/pkgs/protobuf.sh#L47-L65
  • https://github.com/xkszltl/Roaster/blob/b6e950c0a67eac014a1dcff8f084ce71e03f6364/pkgs/onnx.sh#L112-L133
  • https://github.com/xkszltl/Roaster/blob/b6e950c0a67eac014a1dcff8f084ce71e03f6364/pkgs/ort.sh#L106-L153

and to test the build, you can simply load both libs from onnx+ort, or if python parts are built as in the rest part of script, we simple run:

  • https://github.com/xkszltl/Roaster/blob/b6e950c0a67eac014a1dcff8f084ce71e03f6364/pkgs/ort.sh#L215-L232

Since you don't have CI resources for coverage and the related parts was never covered, I would suggest to merge this without testing first, as that would at least unblock people depending on it.

xkszltl avatar Oct 16 '22 12:10 xkszltl