Support survey for stage 3
When we discussed a plan for moving wasi-nn to stage 3 in the WASI proposal process (August 2024), one point of feedback was a desire from the subgroup to collect a set of interested users who plan to use wasi-nn "in production." Though the term "production" was used loosely, it was clear that those asking for this wanted to identify a user group to maintain wasi-nn in the future. This issue intends to collect such a group.
We expect wasi-nn to have a more varied ecosystem than other WASI proposals: different host environments, different companies involved, a different user base. Since the proposal is a standardization effort across all of these, we want to make it clear to the WASI subgroup that those involved are working towards a common specification. To do so, please answer the following questions, providing any context you think is helpful:
- Do you intend to use wasi-nn in production in the next year*?
- Do you intend to maintain a compliant implementation in the next year*, bringing various wasi-nn extensions together in the WASI subgroup to create a unified wasi-nn specification?
* Feel free to replace "next year" with "near future;" as we discussed in the working group, different parties may have different timelines or may be reticent to share their roadmap.
- Do you intend to use wasi-nn in production in the next year*?
Yes, we are currently using wasi-nn in development, and have built host implementations for wasmtime using llama.cpp and candle. These are rough but enable guest inferences through wasi-nn. This will be getting prompted to production in "the near future"
However, the decision was made to extend wasi-nn to allow for streaming tensors results. It appears that a early implementation found in witx and wasmedge https://github.com/second-state/WasmEdge-WASINN-examples/blob/master/wasmedge-ggml/llama-stream/src/main.rs#L65 used a method that doesn't appear to be in the existing definition.
There may be a workaround but ultimately, we introduced streaming wit definitions for graphs and tensors. The goal is to validate this approach and bring it here for discussion once it is cleaned up.
- Do you intend to maintain a compliant implementation in the next year*, bringing various wasi-nn extensions together in the WASI subgroup to create a unified wasi-nn specification?
Yes, our goal is to lean heavily on the wasi-nn spec and future versions. It is not clear to me yet how runtimes will be able to leverage a unified implementation ( so far just guest code ) - maybe using the SIMD spec and compiling a component that exports an inference tool directly. Regardless, it would be great to see more host interoperability/portability between runtimes like wasmedge/wasmtime etc.. i.e. a wasi-nn component exporting the interface functions and guest code that imports wasi-nn for inference.
For our customers, we already helped implement wasi-nn in wasmtime using onnxruntime as the implementation precisely to support wasi-nn in Azure Kubernetes Service, here. In addition, work is currently being done to release wasi-nn support using both wasmtime and wamr implementations in Azure AIO; that work should appear this semester. There are two other projects entering production for wasi-nn that I am not at liberty to discuss yet but which should appear by the end of the calendar year.
- Do you intend to use wasi-nn in production in the next year*?
Yes, we (WasmEdge) have currently used wasi-nn, with some extensions, in production this year. Here is the Gaia project, and Gaia has already deployed over 200K nodes that are using WasmEdge, wasi-nn, and the llama.cpp backend to provide AI applications for their customers. We are also adding support for multi-modal use cases, including vision models (llama 3.2 vision, Qwen2-VL), voice-to-text models (whisper), and text-to-voice models (ChatTTS and more). The multi-modal showcases will be published in the near future.
- Do you intend to maintain a compliant implementation in the next year*, bringing various wasi-nn extensions together in the WASI subgroup to create a unified wasi-nn specification?
Sure thing, we would like to support a unified WASI-NN specification. Especially, we are happy to figure out an ultimate solution between different runtimes to ensure the same experience.