[Feature Request] Native WebGPU Execution Provider
Describe the feature request
Request:
Leverage onnxruntime-web kernels to create a native WebGPU Execution Provider for non-web environments.
Story: I am in a unique situation where my device supports Vulkan, but lacks support for ROCm and CUDA. In the related issue #21917, it seems that Vulkan support was requested, but the discussion appears to have stalled.
Given the progress I've seen with ONNX Runtime in the web environment, I was wondering if the development efforts on the web could be extended to implement a native C++ execution provider. A potential way to achieve this would be by using a library such as wgpu, or more specifically, wgpu-native, which would align well with ONNX Runtime's C++ codebase.
Describe scenario use case
GPUs with no support for ROCm or CUDA, such as older or lower-end GPUs, are currently unable to fully leverage ONNX Runtime's GPU acceleration on Linux. While Windows users have the option to utilize DirectML for GPU support, there is no equivalent solution available for Linux users in this category. These GPUs, while not capable of running ROCm or CUDA, often have Vulkan support, making them suitable candidates for a WebGPU-based execution provider. A native WebGPU Execution Provider would enable efficient ONNX model execution on these devices, particularly in Linux environments, greatly expanding compatibility across platforms without requiring specialized GPU hardware.
I am currently working on a native WebGPU solution that exactly targeting your requirement. Since it needs to rewrite the JavaScript kernels using C++, it may take a while.
- https://github.com/microsoft/onnxruntime/pull/21904
This would be an incredibly valuable addition to ORT
Life will be so much better when this comes..
The WebGPU EP is now in the main branch. currently it has limited operator implementation coverage, but we are working on it. Simply build onnxruntime with --use_webgpu and use AppendExecutionProvider("WebGPU", ...) in code.
@fs-eire I build onnx runtime 1.22.0 from source with the WebGPU EP. My model runs but the reshape op is not supported. Are there plans to support this?
The Reshape does not actually modify the data so this should not be a problem. do you have any problem with it?
The performance seems a little worse than running on CPU, otherwise, it works fine. Specifically, I get this message
...
[I:onnxruntime:, webgpu_execution_provider.cc:801 GetCapability] webgpu kernel not found in registries for Op type: Reshape node name: node_Reshape_2
[I:onnxruntime:, webgpu_execution_provider.cc:801 GetCapability] webgpu kernel not found in registries for Op type: Reshape node name: node_Reshape_2
...
[W:onnxruntime:, session_state.cc:1280 VerifyEachNodeIsAssignedToAnEp] Some nodes were not assigned to the preferred execution providers which may or may not have an negative impact on performance. e.g. ORT explicitly assigns shape related ops to CPU to improve perf.
When running the model on an M1 Pro with WebGPU my latency is around 80ms when running on CPU its around 70ms. Because of the message, I thought it was related to the reshape op not being assigned to the GPU.
Additionally, when a model that does not show that message, WebGPU seems to provide a speedup over CPU. With my second model, inference takes 400ms on CPU and 300ms on WebGPU.
@fs-eire will the webgpu provider also be available for iOS/Android?
It should have already worked with some models on Android. However there are some restrictions on mobile (ie. max buffer size limit) blocking most SLM from our tests.
We will work on it.
I am also noticing same/worse performance between CPU/WebGPU EPs on a Apple M2 Max (onnxruntime-node), with logs that various operations are not supported (eg. Gather, Unsqueeze, Concat, Reshape, Equal, Where, Slice, Mul).
Is this something that is going to be addressed in the future?