[Performance] Transpose optimizer inserts unnecessary transposes

Open RReverser opened this issue 1 month ago • 1 comments

Describe the issue

When looking at the optimized model, I noticed that transpose optimizer turned one input transpose and one output transpose into a lot of intermediate transposes instead, which likely makes U-Net models like ours a lot less efficient than they could be.

To reproduce

Convert any U-Net model from TensorFlow to ONNX via tf2onnx. The converter naturally inserts NHWC->NCHW Transpose node in the beginning:

and the opposite Transpose node in the end:

The encoder pass of the UNet in the middle looks pretty reasonable, each section looking like this:

However, after optimisations and saving the optimised model via the optimizedModelFilePath option, all those sections get wrapped with tons of intermediate transposes:

As far as I can tell, all these transposes should be easy to eliminate by simply reordering the scales of the Resize node and changing the axis of the Concat node.

Urgency

No response

Platform

Web Browser

OS Version

N/A

ONNX Runtime Installation

Released Package

ONNX Runtime Version or Commit ID

1.23.2

ONNX Runtime API

JavaScript

Architecture

Other / Unknown

Execution Provider

Other / Unknown

Execution Provider Library Version

No response

Model File

No response

Is this a quantized model?

Dec 09 '25 14:12 RReverser

As a quick experiment / workaround, I've used onnxscript rewriter API to insert dummy opposite transposes around each Resize node prior to saving it and manually reordered the scales:

def rewrite(self, op, x, roi, sizes):
    transposed_input = op.Transpose(
        x,
        perm=[0, 2, 3, 1],
    )
    output = op.Resize(
        transposed_input,
        roi,
        op.Constant(value_floats=[1.0, 2.0, 2.0, 1.0]),
        coordinate_transformation_mode="asymmetric",
        nearest_mode="floor",
    )
    return op.Transpose(
        output,
        perm=[0, 3, 1, 2],
    )

Now, after onnxslim static opts, the input model looks like this:

And storing the optimized one via optimizedModelFilePath shows that the transposes inserted by the "transpose optimizer" and my own have successfully cancelled out, eliminating all the transposes in the graph:

While this is purely an experiment with hardcoded values, it at least proves this is a viable optimisation.

Dec 09 '25 14:12 RReverser