tensorflow-onnx icon indicating copy to clipboard operation
tensorflow-onnx copied to clipboard

Option to output ORT format instead of using `large_model` for >2GB models

Open josephrocca opened this issue 2 years ago • 2 comments

Context

I have a 3GB model and my end goal is to get the model into the ORT model format, so I tried using tf2onnx.convert.from_function with large_model=True, but unfortunately onnxruntime.tools.convert_onnx_models_to_ort doesn't support the large_model file type. Details here: https://github.com/microsoft/onnxruntime/issues/14697

Question

One solution here is to just wait the onnxruntime team to add support for the large_model format, but I was just wondering whether there's a shared 'intermediate' format that is used internally by both tf2onnx.convert.from_function and onnxruntime.tools.convert_onnx_models_to_ort, such that tf2onnx.convert.from_function could simply import the relevant code from onnxruntime.tools to go directly to the ORT format? I.e. rather than having to go "through" the regular onnx format, which is causing the above-described problem.

josephrocca avatar Feb 15 '23 16:02 josephrocca

large_model will try to separate the tensors from the final ONNX graph so that the final onnx graph is possible to be small enough.

I suspect the onnx file you got is not the correct one. Could you please share more details about how you converted it?

fatcat-z avatar Mar 17 '23 04:03 fatcat-z

@fatcat-z Here's a colab that reproduces the problem: https://colab.research.google.com/gist/josephrocca/059d723b4b6b4b36de4ca1388906fe61/scheduler_step.ipynb

I've tested it just now to make sure that it's still a valid reproduction. It uses the TPU runtime simply because that one has the most RAM, and IIRC the normal CPU runtime ran out of ram.

josephrocca avatar Mar 18 '23 04:03 josephrocca