onnxruntime icon indicating copy to clipboard operation
onnxruntime copied to clipboard

[Build] 0.18.0 release breaks Hummingbird build pipeline

Open ksaur opened this issue 9 months ago • 4 comments

Describe the issue

With the release of 0.18.0, we are having issues with the Transpose op:

>           sess = C.InferenceSession(session_options, self._model_bytes, False, self._read_config_from_model)
E           onnxruntime.capi.onnxruntime_pybind11_state.Fail: [ONNXRuntimeError] : 1 : FAIL : Node (/_operators.0/Transpose) Op (Transpose) [TypeInferenceError] Invalid attribute perm {1, 0}, input shape = {

Can you please help point us to the directions of the changes that might have broken us? Thank you!

Please see https://github.com/microsoft/hummingbird/issues/770

Urgency

This is blocking the Microsoft Hummingbird runners.

Target platform

all

Build script

This is part of the Hummingbird build which depends on onnxruntime. Can you please point us to the relevant changes in your 0.18.0 build?

Error / output

self = <onnxruntime.capi.onnxruntime_inference_collection.InferenceSession object at 0x7fb91dde3e90>
providers = [], provider_options = [], disabled_optimizers = None

    def _create_inference_session(self, providers, provider_options, disabled_optimizers=None):
        available_providers = C.get_available_providers()
    
        # Tensorrt can fall back to CUDA if it's explicitly assigned. All others fall back to CPU.
        if "TensorrtExecutionProvider" in available_providers:
            if providers and any(
                provider == "CUDAExecutionProvider"
                or (isinstance(provider, tuple) and provider[0] == "CUDAExecutionProvider")
                for provider in providers
            ):
                self._fallback_providers = ["CUDAExecutionProvider", "CPUExecutionProvider"]
            else:
                self._fallback_providers = ["CPUExecutionProvider"]
        # MIGraphX can fall back to ROCM if it's explicitly assigned. All others fall back to CPU.
        elif "MIGraphXExecutionProvider" in available_providers:
            if providers and any(
                provider == "ROCMExecutionProvider"
                or (isinstance(provider, tuple) and provider[0] == "ROCMExecutionProvider")
                for provider in providers
            ):
                self._fallback_providers = ["ROCMExecutionProvider", "CPUExecutionProvider"]
            else:
                self._fallback_providers = ["CPUExecutionProvider"]
        else:
            self._fallback_providers = ["CPUExecutionProvider"]
    
        # validate providers and provider_options before other initialization
        providers, provider_options = check_and_normalize_provider_args(
            providers, provider_options, available_providers
        )
    
        session_options = self._sess_options if self._sess_options else C.get_default_session_options()
    
        self._register_ep_custom_ops(session_options, providers, provider_options, available_providers)
    
        if self._model_path:
            sess = C.InferenceSession(session_options, self._model_path, True, self._read_config_from_model)
        else:
>           sess = C.InferenceSession(session_options, self._model_bytes, False, self._read_config_from_model)
E           onnxruntime.capi.onnxruntime_pybind11_state.Fail: [ONNXRuntimeError] : 1 : FAIL : Node (/_operators.0/Transpose) Op (Transpose) [TypeInferenceError] Invalid attribute perm {1, 0}, input shape = {

Visual Studio Version

No response

GCC / Compiler Version

No response

ksaur avatar May 17 '24 21:05 ksaur

This might be where the error message is coming from: https://github.com/onnx/onnx/blob/990217f043af7222348ca8f0301e17fa7b841781/onnx/defs/tensor/defs.cc#L1116-L1128

edgchen1 avatar May 18 '24 01:05 edgchen1

@snnn @yufenglee @jywu-msft @pranavsharma for visibility

sophies927 avatar May 18 '24 01:05 sophies927

this looks like due to an update to transpose opset 21 spec. see: https://onnx.ai/onnx/operators/text_diff_Transpose_13_21.html for difference between transpose opset 13 vs 21 this was added to the description of perms attribute "Its length must be equal to the rank of the input." and it looks like that is being enforced now (see @edgchen1 's link above) from the main error message "[TypeInferenceError] Invalid attribute perm {1, 0}, input shape = {"
so the input shape seems missing? I guess the Transpose nodes in the model don't conform to the new spec.

jywu-msft avatar May 18 '24 02:05 jywu-msft

Thanks so much for the response and for looking into it! :)

In digging a bit more, I see some warnings about [ShapeInferenceError] Inference error(s). Were there any changes to the way dynamic axes work? (I put some debug notes here). Thanks!!

ksaur avatar May 20 '24 04:05 ksaur