[Performance] Transpose optimizer inserts unnecessary transposes
Describe the issue
When looking at the optimized model, I noticed that transpose optimizer turned one input transpose and one output transpose into a lot of intermediate transposes instead, which likely makes U-Net models like ours a lot less efficient than they could be.
To reproduce
Convert any U-Net model from TensorFlow to ONNX via tf2onnx. The converter naturally inserts NHWC->NCHW Transpose node in the beginning:
and the opposite Transpose node in the end:
The encoder pass of the UNet in the middle looks pretty reasonable, each section looking like this:
However, after optimisations and saving the optimised model via the optimizedModelFilePath option, all those sections get wrapped with tons of intermediate transposes:
As far as I can tell, all these transposes should be easy to eliminate by simply reordering the scales of the Resize node and changing the axis of the Concat node.
Urgency
No response
Platform
Web Browser
OS Version
N/A
ONNX Runtime Installation
Released Package
ONNX Runtime Version or Commit ID
1.23.2
ONNX Runtime API
JavaScript
Architecture
Other / Unknown
Execution Provider
Other / Unknown
Execution Provider Library Version
No response
Model File
No response
Is this a quantized model?
No
As a quick experiment / workaround, I've used onnxscript rewriter API to insert dummy opposite transposes around each Resize node prior to saving it and manually reordered the scales:
def rewrite(self, op, x, roi, sizes):
transposed_input = op.Transpose(
x,
perm=[0, 2, 3, 1],
)
output = op.Resize(
transposed_input,
roi,
op.Constant(value_floats=[1.0, 2.0, 2.0, 1.0]),
coordinate_transformation_mode="asymmetric",
nearest_mode="floor",
)
return op.Transpose(
output,
perm=[0, 3, 1, 2],
)
Now, after onnxslim static opts, the input model looks like this:
And storing the optimized one via optimizedModelFilePath shows that the transposes inserted by the "transpose optimizer" and my own have successfully cancelled out, eliminating all the transposes in the graph:
While this is purely an experiment with hardcoded values, it at least proves this is a viable optimisation.