onnxruntime icon indicating copy to clipboard operation
onnxruntime copied to clipboard

[Bug] [Web] [WebGPU] [WebGL] Conv with auto_pad=SAME_UPPER produces wrong values

Open RReverser opened this issue 1 month ago • 2 comments

Describe the issue

As the title says, when Conv node is used on WebGPU with strides>1 and auto_pad e.g. "SAME_UPPER", it produces values that are very much off the expected ones. Wasm and WebNN backends match the Python values well. I'm guessing padding in WebGPU gone wrong?

To reproduce

Here's the absolutely minimal onnxscript-based repro.

repro.py repro.html repro.js

To run, just use repro.py to generate two .onnx models and start a static server with repro.html and repro.js.

You'll see a table like this (I've included WebGL just for completeness, I'm aware it's deprecated):

  wasm webgl webgpu webnn (cpu) webnn (gpu)
Stride 1 (reference)
Range: [8.925, 51.957]
MAD: 0.000
Range: [8.925, 51.957]
MAD: 0.000
Range: [8.925, 51.957]
MAD: 0.000
Range: [8.925, 51.957]
MAD: 0.024
Range: [8.919, 51.927]
Stride 2 (reference)
Range: [9.601, 54.654]
MAD: 1.658
Range: [10.279, 53.070]
MAD: 1.658
Range: [10.279, 53.070]
MAD: 0.000
Range: [9.601, 54.654]
MAD: 0.024
Range: [9.595, 54.622]

plus the dumps of raw data in the console.

P.S. This one took a long time to narrow down for silly reasons, especially since the original symptom was like "the images processed by a unet model looking slightly but visibly wrong, but only when I'm using dynamic shape model despite passing same dimensions as for the static one".

Initially I suspected the Resize nodes, which proved a waste of time, turns out for static model onnxslim was just optimising out autopad with explicit correct pads.

In the end I ended up writing throwaway scripts to bisect the ONNX graph and dump all intermediate outputs. This helped immensely and allowed to narrow down on the specific node where values diverge between backends very quickly, not counting the time of writing those scripts.

It would be great if ONNXRuntime had similar built-in bisect & comparison tools, like Git or C-Reduce, as it would allow to save days of debugging and help report detailed issues a lot quicker.

Urgency

No response

Platform

Web Browser

OS Version

Chromium 142.0.7444.220

ONNX Runtime Installation

Released Package

ONNX Runtime Version or Commit ID

1.23.2; tried the latest @dev available on npm too

ONNX Runtime API

JavaScript

Architecture

Other / Unknown

Execution Provider

Other / Unknown

Execution Provider Library Version

No response

RReverser avatar Dec 05 '25 00:12 RReverser

Out of curiosity also did tests with SAME_LOWER padding and that one works as expected, so it's only SAME_UPPER that is broken in WebGPU & WebGL:

wasmwebglwebgpu
Stride 1, Pad SAME_UPPER(reference)
Range: [8.217, 51.387]
MAD: 0.000
Range: [8.217, 51.387]
MAD: 0.000
Range: [8.217, 51.387]
Stride 2, Pad SAME_UPPER(reference)
Range: [9.834, 50.399]
MAD: 1.658
Range: [11.093, 49.791]
MAD: 1.658
Range: [11.093, 49.791]
Stride 1, Pad SAME_LOWER(reference)
Range: [8.061, 55.547]
MAD: 0.000
Range: [8.061, 55.547]
MAD: 0.000
Range: [8.061, 55.547]
Stride 2, Pad SAME_LOWER(reference)
Range: [10.592, 51.083]
MAD: 0.000
Range: [10.592, 51.083]
MAD: 0.000
Range: [10.592, 51.083]

RReverser avatar Dec 05 '25 04:12 RReverser

Here's the absolutely minimal onnxscript-based repro.

Well, "absolutely" has been a bit of an overstatement.

Played a bit more with reducing dimensions so now it's a simple 1x1x3x3 convolution kernel, basically grayscale kernel for grayscale images so that it's easy to visualise.

Applied to random 8x8 images, the visual difference for SAME_UPPER, stride=2, backend=webgpu is quite noticeable:

Image

Updated repro code:

repro.html repro.js repro.py

RReverser avatar Dec 05 '25 05:12 RReverser

Could someone take a look at this? Unlike some other issues I reported, this one is an actual bug not just potential performance improvement.

RReverser avatar Dec 14 '25 14:12 RReverser

@wenqinI recently investigates WebGPU Conv auto_pad #26771, probably will take a look.

daijh avatar Dec 15 '25 01:12 daijh