MLServer high number of concurrent requests causes MLServer microservice to timeout

high number of concurrent requests causes MLServer microservice to timeout

Open rlleshi opened this issue 2 years ago • 6 comments

These are predictions that are a bit demanding and will take ~20s to finish. If many requests hit the microservice concurrently, it will start becoming significantly slower (40s, 50s, 60s to finish) until it starts to time each request out and no response is returned. Also, no error messages are thrown.

Adding more parallel workers solves the issue until ofc more requests start coming in simultaneously. I've currently experimented with up to 4 parallel workers.

Is there a way to limit how many requests each worker can handle? So, let's say a worker cannot handle more than 2 concurrent requests.

Mar 11 '23 20:03 rlleshi

Hey @rlleshi ,

At the moment there isn't any way to rate limit requests, but it would be a reasonable feature.

In the meantime, it should be possible to control this at the ingress level.

Mar 13 '23 09:03 adriangonz

@adriangonz so we can limit requests, yeah. But the microservice should not become unresponsive, right? Shouldn't it be handling jobs in a way that it won't be blocked by them?

Mar 31 '23 20:03 rlleshi

Hey @rlleshi ,

Just to make sure we've got the right context, is that issue related to the original one? Or are you seeing issues after limiting requests?

Apr 04 '23 08:04 adriangonz

yes, it's the same issue. I have limited the request rate. But my point is that the server should ideally not become unresponsive if it gets a sudden spike in requests regardless of the external request limit, right?

Apr 04 '23 14:04 rlleshi

Sure thing @rlleshi, totally agree. The problem right now is that it's unclear what's causing that unresponsiveness. So far, we haven't seen any problem derived from heavy traffic, so there could be a side effect causing that on your environment that we can't see yet.

Do you have any insight on what may be causing that unresponsiveness? Is it perhaps that workers die (or get OOM-ed)? Or maybe K8s is throttling the CPU cycles?

Apr 20 '23 15:04 adriangonz

Yep, the CPU cycles are getting throttled whenever there is an influx of requests. I am running a worker for each core.

Apr 21 '23 17:04 rlleshi

MLServer MLServer copied to clipboard

high number of concurrent requests causes MLServer microservice to timeout

MLServer
MLServer copied to clipboard