MLServer
MLServer copied to clipboard
high number of concurrent requests causes MLServer microservice to timeout
These are predictions that are a bit demanding and will take ~20s to finish. If many requests hit the microservice concurrently, it will start becoming significantly slower (40s, 50s, 60s to finish) until it starts to time each request out and no response is returned. Also, no error messages are thrown.
Adding more parallel workers solves the issue until ofc more requests start coming in simultaneously. I've currently experimented with up to 4 parallel workers.
Is there a way to limit how many requests each worker can handle? So, let's say a worker cannot handle more than 2 concurrent requests.
Hey @rlleshi ,
At the moment there isn't any way to rate limit requests, but it would be a reasonable feature.
In the meantime, it should be possible to control this at the ingress level.
@adriangonz so we can limit requests, yeah. But the microservice should not become unresponsive, right? Shouldn't it be handling jobs in a way that it won't be blocked by them?
Hey @rlleshi ,
Just to make sure we've got the right context, is that issue related to the original one? Or are you seeing issues after limiting requests?
yes, it's the same issue. I have limited the request rate. But my point is that the server should ideally not become unresponsive if it gets a sudden spike in requests regardless of the external request limit, right?
Sure thing @rlleshi, totally agree. The problem right now is that it's unclear what's causing that unresponsiveness. So far, we haven't seen any problem derived from heavy traffic, so there could be a side effect causing that on your environment that we can't see yet.
Do you have any insight on what may be causing that unresponsiveness? Is it perhaps that workers die (or get OOM-ed)? Or maybe K8s is throttling the CPU cycles?
Yep, the CPU cycles are getting throttled whenever there is an influx of requests. I am running a worker for each core.