tensorflow-onnx
tensorflow-onnx copied to clipboard
Option to output ORT format instead of using `large_model` for >2GB models
Context
I have a 3GB model and my end goal is to get the model into the ORT model format, so I tried using tf2onnx.convert.from_function with large_model=True, but unfortunately onnxruntime.tools.convert_onnx_models_to_ort doesn't support the large_model file type. Details here: https://github.com/microsoft/onnxruntime/issues/14697
Question
One solution here is to just wait the onnxruntime team to add support for the large_model format, but I was just wondering whether there's a shared 'intermediate' format that is used internally by both tf2onnx.convert.from_function and onnxruntime.tools.convert_onnx_models_to_ort, such that tf2onnx.convert.from_function could simply import the relevant code from onnxruntime.tools to go directly to the ORT format? I.e. rather than having to go "through" the regular onnx format, which is causing the above-described problem.
large_model will try to separate the tensors from the final ONNX graph so that the final onnx graph is possible to be small enough.
I suspect the onnx file you got is not the correct one. Could you please share more details about how you converted it?
@fatcat-z Here's a colab that reproduces the problem: https://colab.research.google.com/gist/josephrocca/059d723b4b6b4b36de4ca1388906fe61/scheduler_step.ipynb
I've tested it just now to make sure that it's still a valid reproduction. It uses the TPU runtime simply because that one has the most RAM, and IIRC the normal CPU runtime ran out of ram.