How to handle failed requests(5XX) of an online inference service?

Open yabea opened this issue 2 years ago • 0 comments

When I want to test an online inference service by POST requests, it is necessary to record all response, because of this

However, if the response's status code is not 200, recording this response and calculating latency based on it would be inaccurate, because of this. Moreover, using this approach to find peak performance would not yield the optimal QPS.

Is there any support or recommended approach for handling this scenario?

Feb 21 '24 09:02 yabea