MLServer Deal with expensive explain calls

Deal with expensive explain calls

Open seldondev opened this issue 4 years ago • 0 comments

trafficstars

Currently Explain endpoint is served as v2 predict endpoint, which is synchronous by design (from the client perspective).

In some explanations, especially if we are not using the gpu the call to explain is expensive and cannot be done synchronously (timeouts, etc.)

We might want need to make it async, i.e. start_explain returning an id and then the client checks get_explain_results with this id (these are just examples).

Oct 18 '21 08:10 seldondev

MLServer MLServer copied to clipboard

Deal with expensive explain calls

MLServer
MLServer copied to clipboard