onnxruntime WebGPU: Transpose Conv kernels in Prepack

Prepack Conv kernels with path-aware transpose decisions, store the transposed kernels for reuse, and add ComputeContextBase helpers for node access and GPU buffer unmapping.

Nov 28 '25 01:11 jchen10

Perf data on LNL:

model	variance(%)
sd-turbo-unet-fp16-demo-layernorm	-23.72%
modnet-fp32	-22.99%
sd-turbo-text-encoder-fp16-demo-layernorm	-17.58%
efficientnet-lite-f16-demo	-15.28%
mobilenetv2-12-f16-demo	-14.18%
jina-clip-v1-version	-12.61%
gazenet	-12.22%
sdunet-v1.5-demo-layernorm	-11.43%
modnet-fp16	-10.06%
resnet50-v1-f16-demo	-8.14%
florence-2-base-decoder-fp16	-7.95%
movenet-singlepose-thunder-fp32	-7.61%
jina-clip-v1-version-fp16	-7.54%
depth-anything-base-fp32	-7.45%
detr-resnet-50-fp16	-6.55%
detr-resnet-50	-6.33%
jina-clip-v1-text	-6.32%
movenet-singlepose-thunder-fp16	-6.04%
mobileclip_s0_vision_fp32	-5.14%

Nov 28 '25 11:11 jchen10

@fs-eire @qjia7 @guschmue PTAL

Nov 28 '25 11:11 jchen10

Found the CI error log below. Not quite sure if it is really caused by this PR.

2025-12-02T20:34:21.9671092Z 2: [ FAILED ] CudaNhwcTypedTest/0.ConvNhwcBias, where TypeParam = float (186 ms)

2025-12-02T20:34:21.8768402Z 2: [1;31m2025-12-02 20:34:21.8759375 [E:onnxruntime:Conv, sequential_executor.cc:572 onnxruntime::ExecuteKernel] Non-zero status code returned while running Conv node. Name:'node1' Status Message: CUDA error cudaErrorNotSupported:operation not supported[m
2025-12-02T20:34:21.8769974Z 2: E:\_work\onnxruntime\onnxruntime\onnxruntime\test\providers\compare_provider_test_utils.cc(172): error: Value of: _tmp_status.IsOK()
2025-12-02T20:34:21.8770496Z 2:   Actual: false
2025-12-02T20:34:21.8770652Z 2: Expected: true
2025-12-02T20:34:21.8771227Z 2: [ONNXRuntimeError] : 1 : FAIL : Non-zero status code returned while running Conv node. Name:'node1' Status Message: CUDA error cudaErrorNotSupported:operation not supported
2025-12-02T20:34:21.8771728Z 2:

Dec 03 '25 00:12 jchen10

Found the CI error log below. Not quite sure if it is really caused by this PR.

2025-12-02T20:34:21.9671092Z 2: [ FAILED ] CudaNhwcTypedTest/0.ConvNhwcBias, where TypeParam = float (186 ms)

2025-12-02T20:34:21.8768402Z 2: �[1;31m2025-12-02 20:34:21.8759375 [E:onnxruntime:Conv, sequential_executor.cc:572 onnxruntime::ExecuteKernel] Non-zero status code returned while running Conv node. Name:'node1' Status Message: CUDA error cudaErrorNotSupported:operation not supported�[m
2025-12-02T20:34:21.8769974Z 2: E:\_work\onnxruntime\onnxruntime\onnxruntime\test\providers\compare_provider_test_utils.cc(172): error: Value of: _tmp_status.IsOK()
2025-12-02T20:34:21.8770496Z 2:   Actual: false
2025-12-02T20:34:21.8770652Z 2: Expected: true
2025-12-02T20:34:21.8771227Z 2: [ONNXRuntimeError] : 1 : FAIL : Non-zero status code returned while running Conv node. Name:'node1' Status Message: CUDA error cudaErrorNotSupported:operation not supported
2025-12-02T20:34:21.8771728Z 2:

Tried the case locally with CUDA EP. It didn't reproduce with this PR.

Dec 03 '25 05:12 jchen10

/azp run Linux QNN CI Pipeline,Win_TRT_Minimal_CUDA_Test_CI,Windows ARM64 QNN CI Pipeline,Windows GPU Doc Gen CI Pipeline

Dec 03 '25 16:12 guschmue

Azure Pipelines successfully started running 4 pipeline(s).

Dec 03 '25 16:12 azure-pipelines[bot]

@fs-eire PTAL

Dec 09 '25 01:12 jchen10

@qjia7 Really sorry. I totally missed your review comments.

Dec 16 '25 08:12 jchen10

/azp run Linux QNN CI Pipeline,Win_TRT_Minimal_CUDA_Test_CI,Windows ARM64 QNN CI Pipeline,Windows GPU Doc Gen CI Pipeline

Dec 16 '25 23:12 guschmue

Azure Pipelines successfully started running 4 pipeline(s).

Dec 16 '25 23:12 azure-pipelines[bot]

/azp run Linux QNN CI Pipeline,Win_TRT_Minimal_CUDA_Test_CI,Windows ARM64 QNN CI Pipeline,Windows GPU Doc Gen CI Pipeline

Dec 17 '25 18:12 guschmue

Azure Pipelines successfully started running 4 pipeline(s).

Dec 17 '25 18:12 azure-pipelines[bot]