WebGPU: Transpose Conv kernels in Prepack
Prepack Conv kernels with path-aware transpose decisions, store the transposed kernels for reuse, and add ComputeContextBase helpers for node access and GPU buffer unmapping.
Perf data on LNL:
| model | variance(%) |
|---|---|
| sd-turbo-unet-fp16-demo-layernorm | -23.72% |
| modnet-fp32 | -22.99% |
| sd-turbo-text-encoder-fp16-demo-layernorm | -17.58% |
| efficientnet-lite-f16-demo | -15.28% |
| mobilenetv2-12-f16-demo | -14.18% |
| jina-clip-v1-version | -12.61% |
| gazenet | -12.22% |
| sdunet-v1.5-demo-layernorm | -11.43% |
| modnet-fp16 | -10.06% |
| resnet50-v1-f16-demo | -8.14% |
| florence-2-base-decoder-fp16 | -7.95% |
| movenet-singlepose-thunder-fp32 | -7.61% |
| jina-clip-v1-version-fp16 | -7.54% |
| depth-anything-base-fp32 | -7.45% |
| detr-resnet-50-fp16 | -6.55% |
| detr-resnet-50 | -6.33% |
| jina-clip-v1-text | -6.32% |
| movenet-singlepose-thunder-fp16 | -6.04% |
| mobileclip_s0_vision_fp32 | -5.14% |
@fs-eire @qjia7 @guschmue PTAL
Found the CI error log below. Not quite sure if it is really caused by this PR.
2025-12-02T20:34:21.9671092Z 2: [ FAILED ] CudaNhwcTypedTest/0.ConvNhwcBias, where TypeParam = float (186 ms)
2025-12-02T20:34:21.8768402Z 2: [1;31m2025-12-02 20:34:21.8759375 [E:onnxruntime:Conv, sequential_executor.cc:572 onnxruntime::ExecuteKernel] Non-zero status code returned while running Conv node. Name:'node1' Status Message: CUDA error cudaErrorNotSupported:operation not supported[m
2025-12-02T20:34:21.8769974Z 2: E:\_work\onnxruntime\onnxruntime\onnxruntime\test\providers\compare_provider_test_utils.cc(172): error: Value of: _tmp_status.IsOK()
2025-12-02T20:34:21.8770496Z 2: Actual: false
2025-12-02T20:34:21.8770652Z 2: Expected: true
2025-12-02T20:34:21.8771227Z 2: [ONNXRuntimeError] : 1 : FAIL : Non-zero status code returned while running Conv node. Name:'node1' Status Message: CUDA error cudaErrorNotSupported:operation not supported
2025-12-02T20:34:21.8771728Z 2:
Found the CI error log below. Not quite sure if it is really caused by this PR.
2025-12-02T20:34:21.9671092Z 2: [ FAILED ] CudaNhwcTypedTest/0.ConvNhwcBias, where TypeParam = float (186 ms)
2025-12-02T20:34:21.8768402Z 2: �[1;31m2025-12-02 20:34:21.8759375 [E:onnxruntime:Conv, sequential_executor.cc:572 onnxruntime::ExecuteKernel] Non-zero status code returned while running Conv node. Name:'node1' Status Message: CUDA error cudaErrorNotSupported:operation not supported�[m 2025-12-02T20:34:21.8769974Z 2: E:\_work\onnxruntime\onnxruntime\onnxruntime\test\providers\compare_provider_test_utils.cc(172): error: Value of: _tmp_status.IsOK() 2025-12-02T20:34:21.8770496Z 2: Actual: false 2025-12-02T20:34:21.8770652Z 2: Expected: true 2025-12-02T20:34:21.8771227Z 2: [ONNXRuntimeError] : 1 : FAIL : Non-zero status code returned while running Conv node. Name:'node1' Status Message: CUDA error cudaErrorNotSupported:operation not supported 2025-12-02T20:34:21.8771728Z 2:
Tried the case locally with CUDA EP. It didn't reproduce with this PR.
/azp run Linux QNN CI Pipeline,Win_TRT_Minimal_CUDA_Test_CI,Windows ARM64 QNN CI Pipeline,Windows GPU Doc Gen CI Pipeline
Azure Pipelines successfully started running 4 pipeline(s).
@fs-eire PTAL
@qjia7 Really sorry. I totally missed your review comments.
/azp run Linux QNN CI Pipeline,Win_TRT_Minimal_CUDA_Test_CI,Windows ARM64 QNN CI Pipeline,Windows GPU Doc Gen CI Pipeline
Azure Pipelines successfully started running 4 pipeline(s).
/azp run Linux QNN CI Pipeline,Win_TRT_Minimal_CUDA_Test_CI,Windows ARM64 QNN CI Pipeline,Windows GPU Doc Gen CI Pipeline
Azure Pipelines successfully started running 4 pipeline(s).