django-ratelimit
django-ratelimit copied to clipboard
Document window staggering behavior
https://github.com/jsocol/django-ratelimit/blob/master/ratelimit/utils.py#L79
w = ts - (ts % period) + (zlib.crc32(value) % period)
I've found it's hard to understand the code, could you please explain why add a crc32 constant?
def _get_window(value, period):
ts = int(time.time())
if period == 1:
return ts
if not isinstance(value, bytes):
value = value.encode('utf-8')
w = ts - (ts % period) + (zlib.crc32(value) % period)
if w < ts:
return w + period
return w
Hmm, I thought there was an issue or at least commit comment at some point. This should be documented—at least the behavior if not the implementation.
_get_window()
gets a fixed counting window for the given value
that is staggered so not all windows expire at the same time.
If a rate was 200/1h
, we don't want everyone to be able to start making requests at exactly the same time (e.g. at 08:00, all ratelimited users are unratelimited again and traffic spikes). So we shift the windows. So my hour window might be from 08:21:12 to 09:21:12 and yours might be from 08:42:09 to 09:42:09 (and 10:42:09, etc). It does this for every unit except seconds.
crc32 is a decent, quick way of turning an arbitrary bytestring into an integer. It's distributed randomly-enough for our purposes here, doing a reasonable job of spreading out the windows.
Renaming and leaving this open as an issue to document the window staggering behavior.
@jsocol Thanks for your clear explaination. It's an excellent design on this occasion.
I believe this is already documented here. It might be better to move it out of the easily-missed "Upgrade Notes" section, though, since it's of interest to new users as well.
Aha! Thank you, @marfire—I thought it was written down somewhere.
I'll try to move the docs about window behavior (fixed, staggered) into a better, more obvious spot, but it might take me a little bit to get there. If anyone else has time to take a pass or at least suggest a spot they might expect to find that info in the existing docs, let me know!
You're welcome, and thank you for the package.
I think putting window information on the Rates page would be a natural fit.
I'm wondering if instead of staggering buckets, an algorithm like the token bucket algorithm would be an appropriate approach?
It seems like the token bucket accomplishes the goals of rate limiting while also avoiding the thundering herd problem since bucket capacity is replenished gradually, rather than being immediately refilled at the beginning of each period.
That is a really interesting approach! I hadn’t heard of it before. The Stripe post was really great, too, just as a resource. They didn’t describe their implementation, but I did find one from ClassDojo.
This algorithm is definitely more complex and I don’t think it’s a good fit for a default because I don’t see how you could avoid the race condition issues with Memcached—or generally without using methods not available through the Django cache interface. That would tightly couple ratelimit to Redis and I don’t think that’s OK. (I am happy to be proven wrong, though, if someone can describe a race-condition-free variant using only Django cache API methods.)
A long time ago there was a concept of pluggable backends that was removed in favor of relying directly and purely on the cache API. I still generally believe that’s the right call for this library, and in particular I’m not sure how the token bucket approach would work with the current @ratelimit
decorator API. But I think there’s definitely room for more libraries out there! Or if you want to open another issue with a proposal, please do!
Yup, makes sense. Thanks for the thoughts - I agree this would be hard to do in memcache!