wasmtime Add kserve backend implementation for wasi-nn

Add kserve backend implementation for wasi-nn

Open geekbeast opened this issue 2 years ago • 2 comments

This implements a kserve backend allowing forwarding of wasi-nn calls over http to servers implementing the kserve protocol (documented here https://github.com/kserve/kserve/blob/master/docs/predict-api/v2/required_api.md). It also makes certain API calls that are expected to be expensive, async.

This makes it easy to offload evaluation of inference workloads through async methods to an external service that can support all the various frameworks. Inference workloads tend to be resource heavy and the most popular frameworks have large security attack surfaces. Being able to control when and where they run will make it easier for people to safely use wasmtime with wasi-nn without having to parse and execute model on arbitrary inputs in process.

This PR also implements a kserve registry so that models can be loaded via load named models and adds some better error reporting within what is possible in the current framework.

Aug 21 '23 09:08 geekbeast

I removed the tensor bytes type and for now I'm just assuming that tensor with an initial zero dimension can be interpreted as a string as other sentinel values would likely cause issues.

Nov 07 '23 09:11 geekbeast

Might as well do everything together at this point. Most TODOs are issues I noticed while working on the code and not TODOs for this particular feature.

Nov 07 '23 09:11 geekbeast

wasmtime wasmtime copied to clipboard

Add kserve backend implementation for wasi-nn

wasmtime
wasmtime copied to clipboard