onnxruntime
onnxruntime copied to clipboard
[Build] 0.18.0 release breaks Hummingbird build pipeline
Describe the issue
With the release of 0.18.0, we are having issues with the Transpose
op:
> sess = C.InferenceSession(session_options, self._model_bytes, False, self._read_config_from_model)
E onnxruntime.capi.onnxruntime_pybind11_state.Fail: [ONNXRuntimeError] : 1 : FAIL : Node (/_operators.0/Transpose) Op (Transpose) [TypeInferenceError] Invalid attribute perm {1, 0}, input shape = {
Can you please help point us to the directions of the changes that might have broken us? Thank you!
Please see https://github.com/microsoft/hummingbird/issues/770
Urgency
This is blocking the Microsoft Hummingbird runners.
Target platform
all
Build script
This is part of the Hummingbird build which depends on onnxruntime. Can you please point us to the relevant changes in your 0.18.0 build?
Error / output
self = <onnxruntime.capi.onnxruntime_inference_collection.InferenceSession object at 0x7fb91dde3e90>
providers = [], provider_options = [], disabled_optimizers = None
def _create_inference_session(self, providers, provider_options, disabled_optimizers=None):
available_providers = C.get_available_providers()
# Tensorrt can fall back to CUDA if it's explicitly assigned. All others fall back to CPU.
if "TensorrtExecutionProvider" in available_providers:
if providers and any(
provider == "CUDAExecutionProvider"
or (isinstance(provider, tuple) and provider[0] == "CUDAExecutionProvider")
for provider in providers
):
self._fallback_providers = ["CUDAExecutionProvider", "CPUExecutionProvider"]
else:
self._fallback_providers = ["CPUExecutionProvider"]
# MIGraphX can fall back to ROCM if it's explicitly assigned. All others fall back to CPU.
elif "MIGraphXExecutionProvider" in available_providers:
if providers and any(
provider == "ROCMExecutionProvider"
or (isinstance(provider, tuple) and provider[0] == "ROCMExecutionProvider")
for provider in providers
):
self._fallback_providers = ["ROCMExecutionProvider", "CPUExecutionProvider"]
else:
self._fallback_providers = ["CPUExecutionProvider"]
else:
self._fallback_providers = ["CPUExecutionProvider"]
# validate providers and provider_options before other initialization
providers, provider_options = check_and_normalize_provider_args(
providers, provider_options, available_providers
)
session_options = self._sess_options if self._sess_options else C.get_default_session_options()
self._register_ep_custom_ops(session_options, providers, provider_options, available_providers)
if self._model_path:
sess = C.InferenceSession(session_options, self._model_path, True, self._read_config_from_model)
else:
> sess = C.InferenceSession(session_options, self._model_bytes, False, self._read_config_from_model)
E onnxruntime.capi.onnxruntime_pybind11_state.Fail: [ONNXRuntimeError] : 1 : FAIL : Node (/_operators.0/Transpose) Op (Transpose) [TypeInferenceError] Invalid attribute perm {1, 0}, input shape = {
Visual Studio Version
No response
GCC / Compiler Version
No response
This might be where the error message is coming from: https://github.com/onnx/onnx/blob/990217f043af7222348ca8f0301e17fa7b841781/onnx/defs/tensor/defs.cc#L1116-L1128
@snnn @yufenglee @jywu-msft @pranavsharma for visibility
this looks like due to an update to transpose opset 21 spec.
see: https://onnx.ai/onnx/operators/text_diff_Transpose_13_21.html for difference between transpose opset 13 vs 21
this was added to the description of perms attribute
"Its length must be equal to the rank of the input."
and it looks like that is being enforced now (see @edgchen1 's link above)
from the main error message
"[TypeInferenceError] Invalid attribute perm {1, 0}, input shape = {"
so the input shape seems missing? I guess the Transpose nodes in the model don't conform to the new spec.
Thanks so much for the response and for looking into it! :)
In digging a bit more, I see some warnings about [ShapeInferenceError] Inference error(s)
. Were there any changes to the way dynamic axes work? (I put some debug notes here). Thanks!!