mojo [Feature Request] WebGPU/Wasm deployment

Request

Support for model inference in web browsers and web-compatible server-side JS runtimes like Deno.

Motivation

There has been an explosion in recent months of various ML projects/demos/libraries on the web. These were all launched within the last couple of months:

https://github.com/xenova/transformers.js
https://github.com/mlc-ai/web-llm
https://github.com/visheratin/web-ai
https://github.com/0hq/WebGPT
https://github.com/mlc-ai/web-stable-diffusion
https://whisper.ggerganov.com/

This is in part thanks to the launch of WebGPU, and partly thanks to the years of toil from contributors in projects such as ONNX Runtime Web.

I expect the ecosystem to grow quite quickly in the coming months and years, especially as the planned ML-specific extensions are added to the WebGPU standard.

The Modular keynote mentioned edge deployment, so I'm hoping web is already included in the planned roadmap, but figured I'd make an issue just in case.

Description and Requirements

Ideally there would be two options: Wasm-only inference, and Wasm+WebGPU inference. This is important because some models are too large to fit in GPU memory, but run fast enough on CPU that Wasm inference is still a viable option (see e.g. llama.cpp CPU performance).
Quantization is important - mainly to reduce initial model download times.

May 02 '23 19:05 josephrocca

Mojo/krustlet/cilium/k8s sounds like a really cool stack :smile:

Just speculating: Since MLIR translates to LLVM IR many building blocks for CPU-side WASM seem to already be in place. For WebGPU maybe one could link against a bitcode variant of wgpu/wgpu-native?

May 03 '23 00:05 aaronmondal

Awesome, this would be very cool and we're quite well set up to do this architecturally too. It's not on our immediate roadmap, but thank you for filing this!

May 03 '23 16:05 lattner

Like Chris said, we are pretty much already set up for compiling to WASM, etc., but additional support for the ecosystem beyond the compilation target does not exist and is not on the roadmap for the near future. I'm closing this because I don't think something we need in our issue tracker.

May 07 '23 09:05 Mogball

additional support for the ecosystem beyond the compilation target

Hey @Mogball would you be able to clarify what you mean by this?

Also, maybe related: I'm wondering if Moji could be hooked into IREE so that, IIUC, all the work beyond MLIR (including Wasm and WebGPU, which IREE targets) is already done?

May 07 '23 10:05 josephrocca

All these things are very much possible, and that is indeed the power of full access to the whole MLIR ecosystem, but we are laser-focused on building out the language fundamentals right now.

May 07 '23 11:05 Mogball

WASM is already a heterogenous target given that it can target WebGPU and both of them can sit on top of exotic hardware. That very much sounds like a job for Mojo ! +1 for adding this.

Jul 20 '24 22:07 johnoscott

mojo mojo copied to clipboard

[Feature Request] WebGPU/Wasm deployment

Request

Motivation

Description and Requirements

mojo
mojo copied to clipboard