onnxruntime icon indicating copy to clipboard operation
onnxruntime copied to clipboard

[Feature Request] Native WebGPU Execution Provider

Open audioXD opened this issue 1 year ago • 9 comments

Describe the feature request

Request: Leverage onnxruntime-web kernels to create a native WebGPU Execution Provider for non-web environments.

Story: I am in a unique situation where my device supports Vulkan, but lacks support for ROCm and CUDA. In the related issue #21917, it seems that Vulkan support was requested, but the discussion appears to have stalled.

Given the progress I've seen with ONNX Runtime in the web environment, I was wondering if the development efforts on the web could be extended to implement a native C++ execution provider. A potential way to achieve this would be by using a library such as wgpu, or more specifically, wgpu-native, which would align well with ONNX Runtime's C++ codebase.

Describe scenario use case

GPUs with no support for ROCm or CUDA, such as older or lower-end GPUs, are currently unable to fully leverage ONNX Runtime's GPU acceleration on Linux. While Windows users have the option to utilize DirectML for GPU support, there is no equivalent solution available for Linux users in this category. These GPUs, while not capable of running ROCm or CUDA, often have Vulkan support, making them suitable candidates for a WebGPU-based execution provider. A native WebGPU Execution Provider would enable efficient ONNX model execution on these devices, particularly in Linux environments, greatly expanding compatibility across platforms without requiring specialized GPU hardware.

audioXD avatar Sep 12 '24 11:09 audioXD

I am currently working on a native WebGPU solution that exactly targeting your requirement. Since it needs to rewrite the JavaScript kernels using C++, it may take a while.

  • https://github.com/microsoft/onnxruntime/pull/21904

fs-eire avatar Sep 12 '24 19:09 fs-eire

This would be an incredibly valuable addition to ORT

ola-ableton avatar Sep 16 '24 18:09 ola-ableton

Life will be so much better when this comes..

altunenes avatar Oct 28 '24 21:10 altunenes

The WebGPU EP is now in the main branch. currently it has limited operator implementation coverage, but we are working on it. Simply build onnxruntime with --use_webgpu and use AppendExecutionProvider("WebGPU", ...) in code.

fs-eire avatar Dec 12 '24 23:12 fs-eire

@fs-eire I build onnx runtime 1.22.0 from source with the WebGPU EP. My model runs but the reshape op is not supported. Are there plans to support this?

CaptainDario avatar May 19 '25 17:05 CaptainDario

The Reshape does not actually modify the data so this should not be a problem. do you have any problem with it?

fs-eire avatar May 19 '25 20:05 fs-eire

The performance seems a little worse than running on CPU, otherwise, it works fine. Specifically, I get this message

...
[I:onnxruntime:, webgpu_execution_provider.cc:801 GetCapability] webgpu kernel not found in registries for Op type: Reshape node name: node_Reshape_2
[I:onnxruntime:, webgpu_execution_provider.cc:801 GetCapability] webgpu kernel not found in registries for Op type: Reshape node name: node_Reshape_2
...
[W:onnxruntime:, session_state.cc:1280 VerifyEachNodeIsAssignedToAnEp] Some nodes were not assigned to the preferred execution providers which may or may not have an negative impact on performance. e.g. ORT explicitly assigns shape related ops to CPU to improve perf.

When running the model on an M1 Pro with WebGPU my latency is around 80ms when running on CPU its around 70ms. Because of the message, I thought it was related to the reshape op not being assigned to the GPU.

Additionally, when a model that does not show that message, WebGPU seems to provide a speedup over CPU. With my second model, inference takes 400ms on CPU and 300ms on WebGPU.

CaptainDario avatar May 20 '25 06:05 CaptainDario

@fs-eire will the webgpu provider also be available for iOS/Android?

CaptainDario avatar Jun 18 '25 12:06 CaptainDario

It should have already worked with some models on Android. However there are some restrictions on mobile (ie. max buffer size limit) blocking most SLM from our tests.

We will work on it.

fs-eire avatar Jun 18 '25 20:06 fs-eire

I am also noticing same/worse performance between CPU/WebGPU EPs on a Apple M2 Max (onnxruntime-node), with logs that various operations are not supported (eg. Gather, Unsqueeze, Concat, Reshape, Equal, Where, Slice, Mul).

Is this something that is going to be addressed in the future?

nemphys avatar Oct 04 '25 08:10 nemphys