wasi-nn Support survey for stage 3

When we discussed a plan for moving wasi-nn to stage 3 in the WASI proposal process (August 2024), one point of feedback was a desire from the subgroup to collect a set of interested users who plan to use wasi-nn "in production." Though the term "production" was used loosely, it was clear that those asking for this wanted to identify a user group to maintain wasi-nn in the future. This issue intends to collect such a group.

We expect wasi-nn to have a more varied ecosystem than other WASI proposals: different host environments, different companies involved, a different user base. Since the proposal is a standardization effort across all of these, we want to make it clear to the WASI subgroup that those involved are working towards a common specification. To do so, please answer the following questions, providing any context you think is helpful:

Do you intend to use wasi-nn in production in the next year*?
Do you intend to maintain a compliant implementation in the next year*, bringing various wasi-nn extensions together in the WASI subgroup to create a unified wasi-nn specification?

* Feel free to replace "next year" with "near future;" as we discussed in the working group, different parties may have different timelines or may be reticent to share their roadmap.

Nov 25 '24 18:11 abrown

Do you intend to use wasi-nn in production in the next year*?

Yes, we are currently using wasi-nn in development, and have built host implementations for wasmtime using llama.cpp and candle. These are rough but enable guest inferences through wasi-nn. This will be getting prompted to production in "the near future"

However, the decision was made to extend wasi-nn to allow for streaming tensors results. It appears that a early implementation found in witx and wasmedge https://github.com/second-state/WasmEdge-WASINN-examples/blob/master/wasmedge-ggml/llama-stream/src/main.rs#L65 used a method that doesn't appear to be in the existing definition.

There may be a workaround but ultimately, we introduced streaming wit definitions for graphs and tensors. The goal is to validate this approach and bring it here for discussion once it is cleaned up.

Do you intend to maintain a compliant implementation in the next year*, bringing various wasi-nn extensions together in the WASI subgroup to create a unified wasi-nn specification?

Yes, our goal is to lean heavily on the wasi-nn spec and future versions. It is not clear to me yet how runtimes will be able to leverage a unified implementation ( so far just guest code ) - maybe using the SIMD spec and compiling a component that exports an inference tool directly. Regardless, it would be great to see more host interoperability/portability between runtimes like wasmedge/wasmtime etc.. i.e. a wasi-nn component exporting the interface functions and guest code that imports wasi-nn for inference.

Jan 14 '25 22:01 elewis787

For our customers, we already helped implement wasi-nn in wasmtime using onnxruntime as the implementation precisely to support wasi-nn in Azure Kubernetes Service, here. In addition, work is currently being done to release wasi-nn support using both wasmtime and wamr implementations in Azure AIO; that work should appear this semester. There are two other projects entering production for wasi-nn that I am not at liberty to discuss yet but which should appear by the end of the calendar year.

Jan 16 '25 10:01 squillace

Do you intend to use wasi-nn in production in the next year*?

Yes, we (WasmEdge) have currently used wasi-nn, with some extensions, in production this year. Here is the Gaia project, and Gaia has already deployed over 200K nodes that are using WasmEdge, wasi-nn, and the llama.cpp backend to provide AI applications for their customers. We are also adding support for multi-modal use cases, including vision models (llama 3.2 vision, Qwen2-VL), voice-to-text models (whisper), and text-to-voice models (ChatTTS and more). The multi-modal showcases will be published in the near future.

Do you intend to maintain a compliant implementation in the next year*, bringing various wasi-nn extensions together in the WASI subgroup to create a unified wasi-nn specification?

Sure thing, we would like to support a unified WASI-NN specification. Especially, we are happy to figure out an ultimate solution between different runtimes to ensure the same experience.

Jan 17 '25 10:01 hydai