MLServer
MLServer copied to clipboard
Deal with expensive explain calls
trafficstars
Currently Explain endpoint is served as v2 predict endpoint, which is synchronous by design (from the client perspective).
In some explanations, especially if we are not using the gpu the call to explain is expensive and cannot be done synchronously (timeouts, etc.)
We might want need to make it async, i.e. start_explain returning an id and then the client checks get_explain_results with this id (these are just examples).