mojo
mojo copied to clipboard
[Feature Request] WebGPU/Wasm deployment
Request
Support for model inference in web browsers and web-compatible server-side JS runtimes like Deno.
Motivation
There has been an explosion in recent months of various ML projects/demos/libraries on the web. These were all launched within the last couple of months:
- https://github.com/xenova/transformers.js
- https://github.com/mlc-ai/web-llm
- https://github.com/visheratin/web-ai
- https://github.com/0hq/WebGPT
- https://github.com/mlc-ai/web-stable-diffusion
- https://whisper.ggerganov.com/
This is in part thanks to the launch of WebGPU, and partly thanks to the years of toil from contributors in projects such as ONNX Runtime Web.
I expect the ecosystem to grow quite quickly in the coming months and years, especially as the planned ML-specific extensions are added to the WebGPU standard.
The Modular keynote mentioned edge deployment, so I'm hoping web is already included in the planned roadmap, but figured I'd make an issue just in case.
Description and Requirements
- Ideally there would be two options: Wasm-only inference, and Wasm+WebGPU inference. This is important because some models are too large to fit in GPU memory, but run fast enough on CPU that Wasm inference is still a viable option (see e.g. llama.cpp CPU performance).
- Quantization is important - mainly to reduce initial model download times.
Mojo/krustlet/cilium/k8s sounds like a really cool stack :smile:
Just speculating: Since MLIR translates to LLVM IR many building blocks for CPU-side WASM seem to already be in place. For WebGPU maybe one could link against a bitcode variant of wgpu/wgpu-native?
Awesome, this would be very cool and we're quite well set up to do this architecturally too. It's not on our immediate roadmap, but thank you for filing this!
Like Chris said, we are pretty much already set up for compiling to WASM, etc., but additional support for the ecosystem beyond the compilation target does not exist and is not on the roadmap for the near future. I'm closing this because I don't think something we need in our issue tracker.
additional support for the ecosystem beyond the compilation target
Hey @Mogball would you be able to clarify what you mean by this?
Also, maybe related: I'm wondering if Moji could be hooked into IREE so that, IIUC, all the work beyond MLIR (including Wasm and WebGPU, which IREE targets) is already done?
All these things are very much possible, and that is indeed the power of full access to the whole MLIR ecosystem, but we are laser-focused on building out the language fundamentals right now.
WASM is already a heterogenous target given that it can target WebGPU and both of them can sit on top of exotic hardware. That very much sounds like a job for Mojo ! +1 for adding this.