concurrency-limits Q: VegasLimit performs poorly when rate limited, if the rate limit is similar or less than the window size?

Q: VegasLimit performs poorly when rate limited, if the rate limit is similar or less than the window size?

Open j-baker opened this issue 5 years ago • 1 comments

We currently have servers which have rate limits. We'd prefer to keep these, since at a certain level they're useful for enforcing good development practices. Right now, our backoff policy is randomized exponential backoff with a maximum number of backoffs per individual request.

This does not work for providing backpressure, since if the 'target rate' is greater than the rate limit, eventually a request will (randomly or not) hit its maximum number of retries by being unlucky.

I've been investigating using this library in order to cause backpressure on the client side. The gist is that the server has a rate limit, and if the client is exceeding this rate limit, they will immediately receive an HTTP 429. The client will then back off for a certain amount of time, after which they mark their request as 'dropped'.

With AIMD, this works moderately well - eventually the client goes above the rate limit and a request is dropped. The next window, the concurrency is dropped by half, which generally will correspondingly drop the concurrency too, and cause the rate limit to be missed.

Now, the main issue here is that if the window size is larger than the minimum amount of time spent backing off, there is still some small probability that the request will fail when it hits this point, e.g. if I have 21 threads spinning on making requests, a rate limit of 20 requests per second, with a successful request taking 1 second, and a rate limited request finishing immediately, I will almost surely fail with AIMD - since if e.g. the window is 100, it'll take 5 seconds to close the window, and we don't back off for 5 seconds.

The Vegas limit is even worse, since it's even more aggressive at ramping up and takes potentially many windows to ramp down, which means that failure is roughly guaranteed when the rate limit is low compared to the window size.

It also looks like I should prefer to use the Vegas limit in general, since it is the one that actually responds to being queued.

I'm wondering if there's any changes that could be made to the library here to help with this. At the moment, it seems that only successful requests fill the window, so if I'm being rate limited, I could potentially see many many failures before the backoff occurs. Is this intentional?

I'm wondering whether it would be viable/desirable to have a 'dropped' request

immediately cause backoff to occur (would require changes to the gradient limit, but they're minor)
debounce this so that it can only happen once per full window seen?

Another question would be: Would I end up with something drastically broken if the 'decrease function' for the Vegas limit were to be different for didDrop vs queueSize > beta?

Aug 15 '18 18:08 j-baker

I think didDrop=true should be passed when the rate limit is less than the window size.

Jan 18 '24 13:01 Manjiz

concurrency-limits concurrency-limits copied to clipboard

Q: VegasLimit performs poorly when rate limited, if the rate limit is similar or less than the window size?

concurrency-limits
concurrency-limits copied to clipboard