inference
inference copied to clipboard
How to handle failed requests(5XX) of an online inference service?
When I want to test an online inference service by POST requests, it is necessary to record all response, because of this
However, if the response's status code is not 200, recording this response and calculating latency based on it would be inaccurate, because of this. Moreover, using this approach to find peak performance would not yield the optimal QPS.
Is there any support or recommended approach for handling this scenario?