onnxruntime [Build] 0.18.0 release breaks Hummingbird build pipeline

Describe the issue

With the release of 0.18.0, we are having issues with the Transpose op:

>           sess = C.InferenceSession(session_options, self._model_bytes, False, self._read_config_from_model)
E           onnxruntime.capi.onnxruntime_pybind11_state.Fail: [ONNXRuntimeError] : 1 : FAIL : Node (/_operators.0/Transpose) Op (Transpose) [TypeInferenceError] Invalid attribute perm {1, 0}, input shape = {

Can you please help point us to the directions of the changes that might have broken us? Thank you!

Please see https://github.com/microsoft/hummingbird/issues/770

Urgency

This is blocking the Microsoft Hummingbird runners.

Target platform

all

Build script

This is part of the Hummingbird build which depends on onnxruntime. Can you please point us to the relevant changes in your 0.18.0 build?

Error / output

self = <onnxruntime.capi.onnxruntime_inference_collection.InferenceSession object at 0x7fb91dde3e90>
providers = [], provider_options = [], disabled_optimizers = None

    def _create_inference_session(self, providers, provider_options, disabled_optimizers=None):
        available_providers = C.get_available_providers()
    
        # Tensorrt can fall back to CUDA if it's explicitly assigned. All others fall back to CPU.
        if "TensorrtExecutionProvider" in available_providers:
            if providers and any(
                provider == "CUDAExecutionProvider"
                or (isinstance(provider, tuple) and provider[0] == "CUDAExecutionProvider")
                for provider in providers
            ):
                self._fallback_providers = ["CUDAExecutionProvider", "CPUExecutionProvider"]
            else:
                self._fallback_providers = ["CPUExecutionProvider"]
        # MIGraphX can fall back to ROCM if it's explicitly assigned. All others fall back to CPU.
        elif "MIGraphXExecutionProvider" in available_providers:
            if providers and any(
                provider == "ROCMExecutionProvider"
                or (isinstance(provider, tuple) and provider[0] == "ROCMExecutionProvider")
                for provider in providers
            ):
                self._fallback_providers = ["ROCMExecutionProvider", "CPUExecutionProvider"]
            else:
                self._fallback_providers = ["CPUExecutionProvider"]
        else:
            self._fallback_providers = ["CPUExecutionProvider"]
    
        # validate providers and provider_options before other initialization
        providers, provider_options = check_and_normalize_provider_args(
            providers, provider_options, available_providers
        )
    
        session_options = self._sess_options if self._sess_options else C.get_default_session_options()
    
        self._register_ep_custom_ops(session_options, providers, provider_options, available_providers)
    
        if self._model_path:
            sess = C.InferenceSession(session_options, self._model_path, True, self._read_config_from_model)
        else:
>           sess = C.InferenceSession(session_options, self._model_bytes, False, self._read_config_from_model)
E           onnxruntime.capi.onnxruntime_pybind11_state.Fail: [ONNXRuntimeError] : 1 : FAIL : Node (/_operators.0/Transpose) Op (Transpose) [TypeInferenceError] Invalid attribute perm {1, 0}, input shape = {

Visual Studio Version

No response

GCC / Compiler Version

No response

May 17 '24 21:05 ksaur

This might be where the error message is coming from: https://github.com/onnx/onnx/blob/990217f043af7222348ca8f0301e17fa7b841781/onnx/defs/tensor/defs.cc#L1116-L1128

May 18 '24 01:05 edgchen1

@snnn @yufenglee @jywu-msft @pranavsharma for visibility

May 18 '24 01:05 sophies927

this looks like due to an update to transpose opset 21 spec. see: https://onnx.ai/onnx/operators/text_diff_Transpose_13_21.html for difference between transpose opset 13 vs 21 this was added to the description of perms attribute "Its length must be equal to the rank of the input." and it looks like that is being enforced now (see @edgchen1 's link above) from the main error message "[TypeInferenceError] Invalid attribute perm {1, 0}, input shape = {"
so the input shape seems missing? I guess the Transpose nodes in the model don't conform to the new spec.

May 18 '24 02:05 jywu-msft

Thanks so much for the response and for looking into it! :)

In digging a bit more, I see some warnings about [ShapeInferenceError] Inference error(s). Were there any changes to the way dynamic axes work? (I put some debug notes here). Thanks!!

May 20 '24 04:05 ksaur

onnxruntime onnxruntime copied to clipboard

[Build] 0.18.0 release breaks Hummingbird build pipeline

Describe the issue

Urgency

Target platform

Build script

Error / output

Visual Studio Version

GCC / Compiler Version

onnxruntime
onnxruntime copied to clipboard