[Bug] [Web] [WebGPU] [WebGL] Conv with auto_pad=SAME_UPPER produces wrong values
Describe the issue
As the title says, when Conv node is used on WebGPU with strides>1 and auto_pad e.g. "SAME_UPPER", it produces values that are very much off the expected ones. Wasm and WebNN backends match the Python values well. I'm guessing padding in WebGPU gone wrong?
To reproduce
Here's the absolutely minimal onnxscript-based repro.
To run, just use repro.py to generate two .onnx models and start a static server with repro.html and repro.js.
You'll see a table like this (I've included WebGL just for completeness, I'm aware it's deprecated):
| wasm | webgl | webgpu | webnn (cpu) | webnn (gpu) | |
|---|---|---|---|---|---|
| Stride 1 | (reference) Range: [8.925, 51.957] |
MAD: 0.000 Range: [8.925, 51.957] |
MAD: 0.000 Range: [8.925, 51.957] |
MAD: 0.000 Range: [8.925, 51.957] |
MAD: 0.024 Range: [8.919, 51.927] |
| Stride 2 | (reference) Range: [9.601, 54.654] |
MAD: 1.658 Range: [10.279, 53.070] |
MAD: 1.658 Range: [10.279, 53.070] |
MAD: 0.000 Range: [9.601, 54.654] |
MAD: 0.024 Range: [9.595, 54.622] |
plus the dumps of raw data in the console.
P.S. This one took a long time to narrow down for silly reasons, especially since the original symptom was like "the images processed by a unet model looking slightly but visibly wrong, but only when I'm using dynamic shape model despite passing same dimensions as for the static one".
Initially I suspected the Resize nodes, which proved a waste of time, turns out for static model onnxslim was just optimising out autopad with explicit correct pads.
In the end I ended up writing throwaway scripts to bisect the ONNX graph and dump all intermediate outputs. This helped immensely and allowed to narrow down on the specific node where values diverge between backends very quickly, not counting the time of writing those scripts.
It would be great if ONNXRuntime had similar built-in bisect & comparison tools, like Git or C-Reduce, as it would allow to save days of debugging and help report detailed issues a lot quicker.
Urgency
No response
Platform
Web Browser
OS Version
Chromium 142.0.7444.220
ONNX Runtime Installation
Released Package
ONNX Runtime Version or Commit ID
1.23.2; tried the latest @dev available on npm too
ONNX Runtime API
JavaScript
Architecture
Other / Unknown
Execution Provider
Other / Unknown
Execution Provider Library Version
No response
Out of curiosity also did tests with SAME_LOWER padding and that one works as expected, so it's only SAME_UPPER that is broken in WebGPU & WebGL:
| wasm | webgl | webgpu | |
| Stride 1, Pad SAME_UPPER | (reference) Range: [8.217, 51.387] | MAD: 0.000 Range: [8.217, 51.387] | MAD: 0.000 Range: [8.217, 51.387] |
| Stride 2, Pad SAME_UPPER | (reference) Range: [9.834, 50.399] | MAD: 1.658 Range: [11.093, 49.791] | MAD: 1.658 Range: [11.093, 49.791] |
| Stride 1, Pad SAME_LOWER | (reference) Range: [8.061, 55.547] | MAD: 0.000 Range: [8.061, 55.547] | MAD: 0.000 Range: [8.061, 55.547] |
| Stride 2, Pad SAME_LOWER | (reference) Range: [10.592, 51.083] | MAD: 0.000 Range: [10.592, 51.083] | MAD: 0.000 Range: [10.592, 51.083] |
Here's the absolutely minimal onnxscript-based repro.
Well, "absolutely" has been a bit of an overstatement.
Played a bit more with reducing dimensions so now it's a simple 1x1x3x3 convolution kernel, basically grayscale kernel for grayscale images so that it's easy to visualise.
Applied to random 8x8 images, the visual difference for SAME_UPPER, stride=2, backend=webgpu is quite noticeable:
Updated repro code:
Could someone take a look at this? Unlike some other issues I reported, this one is an actual bug not just potential performance improvement.
@wenqinI recently investigates WebGPU Conv auto_pad #26771, probably will take a look.