wonnx
wonnx copied to clipboard
Default device if no gpu?
Hi there,
I read that wonnx can use gpu through graphics apis like metal and vulkan. Just wondering, does it default to cpu inference if there is no gpu?
Thanks
Hi there,
I read that wonnx can use gpu through graphics apis like metal and vulkan. Just wondering, does it default to cpu inference if there is no gpu?
Thanks
WONNX itself requires a GPU for inference, and does not implement CPU inference itself.
However, on some systems without a hardware GPU, a 'software emulated GPU' may be available (look up Lavapipe/LLVMpipe) which WONNX can use (and that would effectively be CPU inference).
Additionally the WONNX CLI can fall back to CPU inference using the tract
create (this completely bypasses WONNX).
Hi there, I just came across this crate while looking for a Rust-native solution for inference on the web. Completely understand the focus on wgpu-driven inference here, but do you have any plans to / know of anyone who is offering a wasm/web-targeted wrapper that provides a consistent interface that defaults to inference with wonnx if available, or CPU inference (presumably with tract) otherwise?
So, I don't know if tract
is wasm compatible. It probably is.
I just want to clarify that, most modern laptops has an integrated graphic card. It might not be a powerful NVIDIA card, but it might have some integrated INTEL GPU. So, wonnx
should run on most device. If it does not run on the device, there is chance that the performance on CPU might be terrible as it might be an old device.
Yep, agreed on GPU availability - I'm not too concerned about that, it's more WebGPU itself (which has, in my experience, been pretty hard to actually use on the web due to its instability). I assumed that wonnx wouldn't work on the WebGL2 backend because of the lack of compute shader functionality - if it does actually work, then I don't need the CPU fallback.
I see! So, Yes, even if a fallback were to exist, it would probably not work, as wonnx
use a lot of the new webgpu
API such as computepipeline
. I now understand why you guys want to fallback on tract
.
So, I had a big discussion on another tract thread (#104) where we discussed integrations of tract
in wonnx
, and I think that if we were to go that way. It would be easier and also almost free to switch between one and the other.
I genuinely don't know when WebGPU
is going to roll into stable on major browsers.
As far as I am ware WebGPU in browsers is still stuck on security hesitations / implementation of proper sandboxing. The spec (esp. regarding WGSL syntax) has matured over the past few months and is actually usable in the browser when turned on...
Using tract instead of wonnx for inference is actually quite easily done (see wonnx-cli, which can fall back to tract). Further integration is useful for e.g. being able to execute one op in wonnx and fall back to tract for another (in case it hasn't been implemented). Such refactoring could also allow other backends (ORT comes to mind but it could also make sense to implement WebGL2 fallback for some often-used ops if there is demand).
NB, I am curious whether browsers will be implementing software emulated WebGPU in absence of hardware GPU (e.g. based on Lavapipe?). In that case wonnx would universally run (albeit a lot slower, but I wonder how efficient GPU emulation of wonnx ops is when compared to CPU-code running in e.g. WASM. It might be quite efficient!).
Thought about this some more, and I'm going to walk this back: I think it's better if wonnx
focuses purely on WebGPU, and another crate wraps both wonnx
and tract
. That way, wonnx
can focus on providing the best wgpu-based ONNX inference, and the hypothetical crate can focus on providing the subset supported by all backends.
That'd also make it easier to tackle WebNN and vendor-specific NN accelerators in future, too - wonnx
wouldn't have to support those directly, and can focus just on GPU inference.
That's a good idea. I was planning to create some sort of wrapper anyway for the two crates before I used them in my project.
The wonnx-cli
crate actually implements a basic wrapper around both wonnx in gpu.rs and tract in cpu.rs. Each contains an implementation for trait Inferer
(https://github.com/webonnx/wonnx/blob/master/wonnx-cli/src/types.rs#L224).
Yeah, I saw that - that's awesome, that's already getting us quite far. Are there any plans to extract that into a crate of its own?
Yeah, I saw that - that's awesome, that's already getting us quite far. Are there any plans to extract that into a crate of its own?
No but it shouldn’t be too difficult. I’d be happy to review a PR!