wasmtime
wasmtime copied to clipboard
Add kserve backend implementation for wasi-nn
This implements a kserve backend allowing forwarding of wasi-nn calls over http to servers implementing the kserve protocol (documented here https://github.com/kserve/kserve/blob/master/docs/predict-api/v2/required_api.md). It also makes certain API calls that are expected to be expensive, async.
This makes it easy to offload evaluation of inference workloads through async methods to an external service that can support all the various frameworks. Inference workloads tend to be resource heavy and the most popular frameworks have large security attack surfaces. Being able to control when and where they run will make it easier for people to safely use wasmtime with wasi-nn without having to parse and execute model on arbitrary inputs in process.
This PR also implements a kserve registry so that models can be loaded via load named models and adds some better error reporting within what is possible in the current framework.
I removed the tensor bytes type and for now I'm just assuming that tensor with an initial zero dimension can be interpreted as a string as other sentinel values would likely cause issues.
Might as well do everything together at this point. Most TODOs are issues I noticed while working on the code and not TODOs for this particular feature.