wonnx icon indicating copy to clipboard operation
wonnx copied to clipboard

Default device if no gpu?

Open rhobro opened this issue 2 years ago • 12 comments

Hi there,

I read that wonnx can use gpu through graphics apis like metal and vulkan. Just wondering, does it default to cpu inference if there is no gpu?

Thanks

rhobro avatar May 28 '22 15:05 rhobro

Hi there,

I read that wonnx can use gpu through graphics apis like metal and vulkan. Just wondering, does it default to cpu inference if there is no gpu?

Thanks

WONNX itself requires a GPU for inference, and does not implement CPU inference itself.

However, on some systems without a hardware GPU, a 'software emulated GPU' may be available (look up Lavapipe/LLVMpipe) which WONNX can use (and that would effectively be CPU inference).

Additionally the WONNX CLI can fall back to CPU inference using the tract create (this completely bypasses WONNX).

pixelspark avatar May 28 '22 18:05 pixelspark

Hi there, I just came across this crate while looking for a Rust-native solution for inference on the web. Completely understand the focus on wgpu-driven inference here, but do you have any plans to / know of anyone who is offering a wasm/web-targeted wrapper that provides a consistent interface that defaults to inference with wonnx if available, or CPU inference (presumably with tract) otherwise?

philpax avatar Jul 30 '22 14:07 philpax

So, I don't know if tract is wasm compatible. It probably is.

I just want to clarify that, most modern laptops has an integrated graphic card. It might not be a powerful NVIDIA card, but it might have some integrated INTEL GPU. So, wonnx should run on most device. If it does not run on the device, there is chance that the performance on CPU might be terrible as it might be an old device.

haixuanTao avatar Jul 30 '22 20:07 haixuanTao

Yep, agreed on GPU availability - I'm not too concerned about that, it's more WebGPU itself (which has, in my experience, been pretty hard to actually use on the web due to its instability). I assumed that wonnx wouldn't work on the WebGL2 backend because of the lack of compute shader functionality - if it does actually work, then I don't need the CPU fallback.

philpax avatar Jul 30 '22 20:07 philpax

I see! So, Yes, even if a fallback were to exist, it would probably not work, as wonnx use a lot of the new webgpu API such as computepipeline. I now understand why you guys want to fallback on tract.

So, I had a big discussion on another tract thread (#104) where we discussed integrations of tract in wonnx, and I think that if we were to go that way. It would be easier and also almost free to switch between one and the other.

I genuinely don't know when WebGPU is going to roll into stable on major browsers.

haixuanTao avatar Jul 30 '22 21:07 haixuanTao

As far as I am ware WebGPU in browsers is still stuck on security hesitations / implementation of proper sandboxing. The spec (esp. regarding WGSL syntax) has matured over the past few months and is actually usable in the browser when turned on...

Using tract instead of wonnx for inference is actually quite easily done (see wonnx-cli, which can fall back to tract). Further integration is useful for e.g. being able to execute one op in wonnx and fall back to tract for another (in case it hasn't been implemented). Such refactoring could also allow other backends (ORT comes to mind but it could also make sense to implement WebGL2 fallback for some often-used ops if there is demand).

pixelspark avatar Jul 31 '22 09:07 pixelspark

NB, I am curious whether browsers will be implementing software emulated WebGPU in absence of hardware GPU (e.g. based on Lavapipe?). In that case wonnx would universally run (albeit a lot slower, but I wonder how efficient GPU emulation of wonnx ops is when compared to CPU-code running in e.g. WASM. It might be quite efficient!).

pixelspark avatar Jul 31 '22 09:07 pixelspark

Thought about this some more, and I'm going to walk this back: I think it's better if wonnx focuses purely on WebGPU, and another crate wraps both wonnx and tract. That way, wonnx can focus on providing the best wgpu-based ONNX inference, and the hypothetical crate can focus on providing the subset supported by all backends.

That'd also make it easier to tackle WebNN and vendor-specific NN accelerators in future, too - wonnx wouldn't have to support those directly, and can focus just on GPU inference.

philpax avatar Aug 19 '22 01:08 philpax

That's a good idea. I was planning to create some sort of wrapper anyway for the two crates before I used them in my project.

rhobro avatar Aug 21 '22 08:08 rhobro

The wonnx-cli crate actually implements a basic wrapper around both wonnx in gpu.rs and tract in cpu.rs. Each contains an implementation for trait Inferer (https://github.com/webonnx/wonnx/blob/master/wonnx-cli/src/types.rs#L224).

pixelspark avatar Aug 21 '22 18:08 pixelspark

Yeah, I saw that - that's awesome, that's already getting us quite far. Are there any plans to extract that into a crate of its own?

philpax avatar Aug 22 '22 10:08 philpax

Yeah, I saw that - that's awesome, that's already getting us quite far. Are there any plans to extract that into a crate of its own?

No but it shouldn’t be too difficult. I’d be happy to review a PR!

pixelspark avatar Aug 22 '22 20:08 pixelspark