discord-api-docs icon indicating copy to clipboard operation
discord-api-docs copied to clipboard

add rate limit algorithm clarification for global rate limits

Open switchupcb opened this issue 2 years ago • 27 comments

Description

The API documentation states that the Global Rate Limit is 50 requests per second.

Here is a scenario that currently occurs occasionally in interaction with the Discord API using a global bucket rate limit with a 1 second expiry time. The endpoint used in this scenario does NOT provide rate limit headers (potentially intended) contrary to https://github.com/discord/discord-api-docs/issues/1808#issuecomment-658535754. The actual global rate limit bucket header is NOT present unless you hit a 429 at which point the Retry-After header is present (intended).

101 requests are queued.
50 requests are sent from .0157s to .169s
50 requests are received (at Discord) from .275s to 0.386s
No requests will be sent until 1.275s
50 requests are sent from from 1.29s to 1.376s
50 requests are received (at Discord) from 1.392s to 1.503s.
No requests will be sent until 2.275s.
1 request is sent at 2.277s
1 request is rate limited at 2.355s.

View the full log.

NOTE: This method of handling works 100% of the time with a 1.1 second wait time.

Why did the request get rate limited?

See https://github.com/discord/discord-api-docs/issues/5144#issuecomment-1178468457.

How are global rate limits calculated from Discord?

A global rate limit uses a bucket algorithm with a reset time based on a Date header (with an unspecified precision; presented to the second).

Steps to Reproduce

The library used to run this test is incomplete: https://github.com/switchupcb/disgo/pull/14. However, in a similar manner to https://github.com/discord/discord-api-docs/issues/4875 (consensus), there is likely No issue with its logs. If you would like to reproduce the issue from scratch, you can perform the following:

  1. Implement a global rate limit bucket algorithm using buckets that account for latency.
  2. Run that code that queues 101 - 1001 requests; sending 50 requests per second.

See https://github.com/discord/discord-api-docs/issues/5144#issuecomment-1179633775 for a working implementation.

Expected Behavior

Explain in the documentation how the global rate limit is calculated from Discord's end in a similar manner to per-route rate limits.

Current Behavior

The documentation only states that the global rate limit is 50 requests per second.

Status: A pull request is being made to address this issue.

switchupcb avatar Jul 02 '22 02:07 switchupcb

In addition, please add how we determine which requests route to a bucket. The documentation states:

Per-route rate limits exist for many individual endpoints, and may include the HTTP method (GET, POST, PUT, or DELETE). In some cases, per-route limits will be shared across a set of similar endpoints, indicated in the X-RateLimit-Bucket header. It's recommended to use this header as a unique identifier for a rate limit, which will allow you to group shared limits as you encounter them.

What does may include the HTTP method indicate? Doesn't an "individual endpoint" imply a respective HTTP method already? Are we expected to only know which endpoints pertain to the same bucket when we encounter them (making 429s inevitable)? Otherwise, how can you determine the bucket (not bucket id) of an endpoint prior to sending that endpoint?

See https://github.com/discord/discord-api-docs/issues/5144#issuecomment-1177140586 for the answer to these questions.

switchupcb avatar Jul 02 '22 05:07 switchupcb

@devsnek confirmed https://github.com/discord/discord-api-docs/issues/5161 as a bug. Does this issue also contain a bug? The endpoint used in this scenario is GetCurrentBotApplicationInformation.

switchupcb avatar Jul 05 '22 22:07 switchupcb

In addition, please add how we determine which requests route to a bucket.

image

Based on this response from @advaith1 :

What does may include the HTTP method indicate?

The endpoint may or may not be specific to the HTTP method (GET, POST, PUT, or DELETE). For example, Get Auto Moderation Rule and Modify Auto Moderation Rule have the same endpoint: "/guilds/{guild.id}/auto-moderation/rules/{auto_moderation_rule.id}" but separate HTTP methods GET and PATCH. The GET HTTP method for this endpoint could have its own rate limit bucket, or it could be in the rate limit bucket for the endpoint with the PATCH endpoint.

Doesn't an "individual endpoint" imply a respective HTTP method already?

No.

Are we expected to only know which endpoints pertain to the same bucket when we encounter them (making 429s inevitable)?

Yes, unfortunately.

Otherwise, how can you determine the bucket (not bucket id) of an endpoint prior to sending that endpoint?

Initially, you can't.

Upon receiving a bucket header (the first time an endpoint is sent), classify that endpoint and HTTP method to its respective — application-side — rate limit bucket. Then, requests with the same endpoint and HTTP method — with potentially different parameters — can be sent within the confines of that bucket.

You still have to keep track of the rate limit bucket headers for each request, since the rate limit bucket can change. For example, a rate limit bucket that increases in size (on Discord's end) would result in a sub-optimal request speed for that endpoint if the application doesn't update it. In contrast, a rate limit bucket that decreases in size on Discord's end) would result in unnecessary 429's.

Not every endpoint has a rate limit bucket; other than the requirement to adhere to a global rate limit. Bugs such as https://github.com/discord/discord-api-docs/issues/5161 are especially problematic when important rate limit headers are missing (such as Retry-After) because — in combination with a "correct" implementation — it can result in extreme undefined behavior. As a reminder, receiving 10000 429's within 10 minutes gets banned for an ~hour.

The other consideration you must make is if a rate limit bucket changes the amount of endpoints it contains. If a rate limit bucket that contains two separate endpoints splits into two rate limit buckets with an endpoint each, you will be sending requests at a sub-optimal speed. This problem is only fixed by always reading rate limit bucket headers, and creating/updating the respective rate limit bucket. In a similar manner to a change in rate limit bucket size, a rate limit that adds another endpoint to its bucket can result in a 429. However, this problem is simply fixed by deleting buckets that hit 429s.

In other words, an endpoint-http's rate limit bucket can be determined after it has been sent once, until that rate limit bucket changes. As a result, you must always read rate limit bucket headers and always update them per endpoint-http method. More importantly, encountering a 429 as a result of rate-limit bucket changes are inevitable due to Discord's implementation of rate-limit bucket discovery.

switchupcb avatar Jul 07 '22 06:07 switchupcb

Why did the request get rate limited?

In the following partial log (with a 1.01s wait time), we can see that the main issue occurs due to sending requests 1 - 10 ms faster than the Discord Server. This is why the rate limits typically occur near the start/end of rate limit buckets (with that implementation).

Request 50 429 PARSED HEADER &{0 0 0 0  true global} PARSED RETRY AFTER 1
        from 2022-07-07 19:46:43.3999559 -0500 CDT m=+5.352960901
 HTTP/1.1 429 Too Many Requests
Server: cloudflare
Date: Fri, 08 Jul 2022 00:46:43 GMT
Content-Type: application/json
Content-Length: 91
Connection: keep-alive
Retry-After: 1
X-Ratelimit-Scope: global
X-Ratelimit-Global: true
Strict-Transport-Security: max-age=31536000; includeSubDomains; preload
X-Envoy-Upstream-Service-Time: 17
...

 {
  "global": true,
  "message": "You are being rate limited.",
  "retry_after": 0.005
}

Retry After: .005 * 1000 = 5 ms

In a separate run.

See edited.
4ms
9ms
1ms
2ms

In the original run, a request is sent within 2 ms of an exact expiry and round-tripped (pinged) in 78 ms. However, we cannot guarantee when the Discord server receives the request at 34 ms or even when it's processed. Luckily, the server sends a Date Header with the given second that the request was received at. Unfortunately, the Date header only contains precision to the second.

The implication is that — in the original run — a rate limit could have occurred due to the second that we sent the first request of the new bucket being unaligned with the second on Discord's server, or occurring milliseconds earlier than necessary. This would result in 51 requests in one Discord second resulting in a 429.

It was also possible for 50 requests to be sent, take longer than expiry, have the first request sent or processed before those requests, and have those requests get processed at that time resulting in a 429. However, this never occurred in any of those tests.

In short, the Date header is important for correct rate-limit handling and should be noted in the documentation accordingly.

switchupcb avatar Jul 08 '22 02:07 switchupcb

The date header is formatted in the RFC 1123 Time Format.

Fri, 08 Jul 2022 10:26:16 GMT

switchupcb avatar Jul 08 '22 10:07 switchupcb

This log corresponds with the previous commit and showcases how global rate limit buckets either:

  1. Do not expire at the second despite being presented at the second (i.e Fri, 08 Jul 2022 10:26:16 GMT)
  2. Uses a different method of calculating the actual rate limit.

I am biased towards 1 because of the following information.

received 2022-07-08 23:34:34 +0000 GMT
        from 2022-07-08 18:34:35.0135301 -0500 CDT m=+0.519347601
        at 2022-07-08 18:34:35.1634324 -0500 CDT m=+0.669249901 49 1
        account prior
        date 2022-07-08 23:34:34 +0000 GMT
                now 2022-07-08 18:34:35.1640879 -0500 CDT m=+0.669905401 50 0
                exp 2022-07-08 18:34:36 -0500 CDT
        429 &{0 0 0 0  true global} Retry-After 1 retry_after 0.661

This partial log indicates that a request was received (on Discord's end) at 2022-07-08 23:34:34 +0000 GMT. The application sent the request at 2022-07-08 18:34:35.0135301 -0500 CDT m=+0.519347601 which occurs ~13.5 ms after 18:34:35. The application received the response (from Discord) at 2022-07-08 18:34:35.1634324 -0500 CDT m=+0.669249901.

account prior can be ignored: It accounts for the fact that this request was received (on Discord's end) in the previous rate limit bucket, when the application allocated the request to be received in the current rate limit bucket. As a result, it grants the current rate limit bucket an additional request.

The retry_after JSON response shows .661s (661ms) remaining until the "bucket" is reset. This would indicate that Discord's bucket isn't resetting at 18:34:34:00 but at 18:34:34:6... or 18:34:35:6.... In addition, the request that was sent and received AFTER 18:34:34 (from 18:34:35.0135301) is marked as 18:34:34 instead of 18:34:35.

If it is case 2, we need Discord to confirm the actual way to handle this.

switchupcb avatar Jul 09 '22 00:07 switchupcb

I don't think I understand the issue here. There's no summary of what you mean by this, and I don't think you reached any type of finalized point. It is just stated information surrounding behaviour you experienced.

It doesn't matter when Discord's actual buckets really reset. As long as you have a type of timed semaphore which guarantee that there is not more than 50 requests per second it doesn't matter much when the window opens/closes. No matter the drift in when you open/close the window, you should not run exceed Discord's ratelimits assuming you do not send 50 requests inside of your window.

Here's the implementation I use for global ratelimits (simplified):

class GlobalRatelimit:
    def __init__(self) -> None:
        self._lock = anyio.Lock()

        self._reset = None
        self._value = 50

    async def wait(self) -> None:
        async with self._lock:
            if self._reset is None or self._reset < time.perf_counter():
                self._reset = time.perf_counter() + 1
                self._value = 50

            elif self._value <= 0:
                await anyio.sleep(self._reset - time.perf_counter())
                self._reset = time.perf_counter() + 1
                self._value = 50

           self._value -= 1

As for route-specific ratelimits. That is a much more complicated question with several more strategies one might look for. The way I like to think about ratelimits is using the following psuedo-diagram:

                      Ratelimit
            ----------^^^^^^^^^------------
            Hash          +     Major params
   ---------^^^^-------
   Route      +   Route
---^^^^^-----   ---^^^^^----
Method + Path   Method + Path 

A method and path make up a route, multiple routes make up a bucket (which you receive from Discord). Finally, you can construct an accurate representation of a ratelimit with the bucket hash and its major parameters (channel_id, guild_id, webhook_id/interaction_id).

Since you do not know the bucket hash of routes before they are made, you will either need two "modes" or make HEAD requests to a specific route to receive the headers.

I use the former approach. That is, in my implementation I have 3 mappings used:

  • (buckets) Mapping of Route to X-RateLimit-Bucket
  • (locks) Weak-referenced mapping of Hash + Major param to a Ratelimit object*
  • (fallbacks) Weak-referenced mapping of Route to Ratelimit object*

The last mapping is used as a fallback if the first mapping does not have a X-RateLimit-Bucket for the specific Route. Only the first mapping holds items forever, because the amount of Routes in existence is finite to how many your library supports. The other two mappings' items can grow infinitely and hold no use unless they're in-use.

*A Ratelimit object is the implementation for synchronization I use, will be explained below.

When a request is about to be made, first the Route is looked up in the buckets and depending on whether a result is found a Ratelimit object is created/looked up (depending on whether it exists) in either the second or third mapping with the value and limit being 1.

The Ratelimit object, which there exists one per Route / Hash + Major params, is now acquired. The implementation of this is very similar to that of the global ratelimit - being a timed semaphore without a management task - had it not been simplified above. There are a lot of considerations needed to be made. Here's a list of the information I keep around in these objects:

  • A lock, ensuring that acquiring happens in sequence (if one request needs to sleep, all the following needs to as well)
  • An event specifying whether the route is ratelimited (set to allow request), to be able to handle suprise sub-route ratelimits
  • A timestamp of when the bucket next resets
  • An event for notifying the current request that a timestamp has been acquired
  • The amount of remaining requests
  • The limit of the bucket

Additionally, there is some safety-mechanisms in case requests fail and do not get headers as expected:

  • An accurate count of in-progress requests

These values are all actively used in my ratelimiting. Now, here's a written explanation of how this logic works:

  1. The Ratelimit object starts by acquiring its lock, waiting if necessary, to make sure sleeping happens in sequence.
  2. After it has been given ownership of the lock it starts by checking whether it is currently ratelimited using the first event, waiting if necessary.
  3. Then, it checks the remaining requests and returns* if possible.
  4. In the common case where there is no more remaining requests. It checks whether there is a known timestamp of when the current window resets.
  5. If this is not yet known, the Ratelimit object will wait on the second event to be notified of the time that the window resets. After this has been set, the Ratelimit object checks the remaining requests again** and returns* if there are any.
  6. Assuming there is no remaining requests, the Ratelimit object now sleeps until the window resets (assuming it is not in the past, if it is then it can skip that).
  7. After the sleep, it is now possible to set the remaining requests to limit - 1, remove the known window reset (null/None), and return*.

*When returning, the lock is released to the next request, and the in-progress requests is incremented by one.

**This is done because of the behaviour seen below when receiving a response.

The ratelimit has now ben acquired and the request is made. The ratelimiter and Ratelimit object can now be updated with the information present in the response (depending on whether all headers are returned):

  • The buckets mapping is unconditionally updated and Ratelimit object moved to the locks mapping (from fallbacks)
  • The Ratelimit object's limit value is updated to the one received by the request and have the remaining value updated:
    val = ...
    
    diff = limit - remaining
    limit = val
    remaining = limit - diff
    
  • The Ratelimit object's remaining value is updated, only if the response returned a value smaller than the locally stored one. This should not happen and is not expected behaviour, but if other processes also make requests this attempts to handle that as gracefully as possible
  • The Ratelimit object's timestamp of window reset is updated, setting the event in-case other requests are waiting, then clearing the event to prepare it for the next response

Lastly, the Ratelimit object's release method is called which decrements the amount of in-progress requests. If this value falls to 0 it verifies that the reset of the current window is known. If it is not known, to ensure that there is no dead-lock it adds one to the remaining requests and sets the event used to notify requests about the update of the window reset (this is why the remaining requests are checked, marked by **). This means that in my implementation, requests which do not receive headers run one-by-one.

In the event that a 429 response is received (the above still runs to the best of its abilities depending on the information received) the first event is reset, which makes following requests wait until it is set again.

You can find my current implementation here (Python, of course) which I like to think is pretty commented. I can't promise that it is perfect, but this explanation and the code should give you some thoughts into how to design such a system. The special _RouteRatelimit object is a wrapper for each request made, tying together the Ratelimit lock and request made, it is responsible for updating the Ratelimit object after each request.

I have no idea whether this will be to any help, but there may be future readers of this issue.

Bluenix2 avatar Jul 09 '22 03:07 Bluenix2

I don't think I understand the issue here. — @Bluenix2

Information surrounding global rate limit handling is too ambiguous for a straightforward implementation.

It is just stated information surrounding behaviour you experienced.

See https://github.com/discord/discord-api-docs/issues/5144#issue-1291939218. Multiple tests are provided throughout the post. Surrounding behavior is included to help solve the issue.

As long as you have a type of timed semaphore which guarantee that there is not more than 50 requests per second it doesn't matter much when the window opens/closes.

Not according to the tests above.

Here's the implementation I use for global ratelimits (simplified)

Did you test this at 50 requests per second?

Since you do not know the bucket hash of routes before they are made, you will either need two "modes" or make HEAD requests to a specific route to receive the headers.

[Further Implementation]

Great writeup! Implementation details will differ between libraries. We will use a code generator to map all ~170 endpoints to an id prior to compilation, then maintain a synchronized map of those id's to a rate limit bucket object (updated upon each request). This eliminates a need for a third-mapping.

After the sleep, it is now possible to set the remaining requests to limit - 1, remove the known window reset (null/None), and return*.

You need to account for the chance where you send 50 requests, but Discord does not receive them until multiple rate limit buckets later. Alternatively, in the case where a sent request is not counted towards the current bucket you reset (see https://github.com/discord/discord-api-docs/issues/5144#issuecomment-1179439967).

The Ratelimit object's remaining value is updated, only if the response returned a value smaller than the locally stored one. This should not happen and is not expected behaviour, but if other processes also make requests this attempts to handle that as gracefully as possible

This is expected behavior for a concurrency safe library. You can use a mutex or atomicity while setting the new remaining requests value.

This means that in my implementation, requests which do not receive headers run one-by-one.

Not acceptable.

switchupcb avatar Jul 09 '22 09:07 switchupcb

Did you test this at 50 requests per second?

I have done tests in the past, but now I did a completely over-the-top test towards the Discord API.
from typing import NoReturn, SupportsInt

import anyio
from wumpy.rest import APIClient


# 12 channels, spamming as many requests as it can which means we end up with
# bursts of 60 requests (more than the global limit of 50 rps)
CHANNELS = [
    878988417436905483, 878988447593930783, 878988471971250186,
    878988500165353492, 878988522416111616, 878988559833526272,
    878988577495711814, 878988605996027924, 878988636387962890,
    878988636387962890, 878988667727777882, 878988690997772288,
]

async def repeat(api: APIClient, channel: SupportsInt) -> NoReturn:
    while True:
        await api.send_message(channel, content='Testing...')


async def spam_channel(api: APIClient, channel: SupportsInt) -> NoReturn:
    async with anyio.create_task_group() as tg:
        # We spawn 6 tasks, to get ratelimited even on the route-specific
        # ratelimits - 5 limit.

        tg.start_soon(repeat, api, channel)
        tg.start_soon(repeat, api, channel)
        tg.start_soon(repeat, api, channel)
        tg.start_soon(repeat, api, channel)
        tg.start_soon(repeat, api, channel)
        tg.start_soon(repeat, api, channel)


async def run_spam(api: APIClient) -> NoReturn:
    async with anyio.create_task_group() as tg:
        for channel in CHANNELS:
            tg.start_soon(spam_channel, api, channel)


async with APIClient(...) as api:
    await run_spam(api)

I do hit the global ratelimit, yes. This could have to do with how requests take different amount of time and appears in later windows (which you brought up about my implementation for route-specific ratelimits), but I am not sure this explains everything, I suppose you can consider this issue reproduced by me.

Generally speaking though, some libraries only handle this issue after Discord tells them. If your bot is big enough to run into issues with the global ratelimiter you are most likely eligible for an increased global ratelimit.

You need to account for the chance where you send 50 requests, but Discord does not receive them until multiple rate limit buckets later. Alternatively, in the case where a sent request is not counted towards the current bucket you reset (see https://github.com/discord/discord-api-docs/issues/5144#issuecomment-1179439967).

I don't think I have considered this no, seeing as this must be a very rare case. This does not need to be handled when requests run in sequence but does because of how I let requests run concurrently, yes. I could take a look at the in-progress requests and assume that they have been slow enough to take up slots in the new window.

This is expected behavior for a concurrency safe library. You can use a mutex or atomicity while setting the new remaining requests value.

I don't need to, but I use locks wherever necessary. Python's async/await is not green threads (like how Goroutines are); it is cooperative scheduled where await marks the checkpoint. Setting the remaining requests does not yield, so there is no lock necessary. I think this errs on being irrelevant to this issue, feel free to DM me on Discord (Bluenix#7543) I'd love to talk more.

Not acceptable.

Why is that? How would you recommend I handle this instead? The ratelimiter is supposed to prevent 429 responses. When all requests at a particular window fails and I do not know whether it counted for the ratelimiting, or I get no ratelimiting headers, I am left in the dark and err on the side of caution. It falls back to sequential ratelimiting which tends to be safer.

Bluenix2 avatar Jul 10 '22 00:07 Bluenix2

https://github.com/switchupcb/disgo/pull/14#issuecomment-1179632428 confirms the (possible) behavior of the global rate limit algorithm with a working global rate limiter implementation.

The following log corresponds with the previous commit and updates the global rate limit bucket by using the Discord Date as the single source of truth. An assumption is made that the Global Rate Limit is implemented (on Discord's end) using a server-based bucket algorithm where a 50-token bucket is reset at a specific point in time (every second).

This is opposed to a servicer-based (leaky?) bucket algorithm where a 50-token bucket is reset (created) when the server handles the first request, and sets its expiry to every second thereafter.

Solution Using the information collected in this thread, I will create a Pull Request that provides the expected behavior of this issue.

switchupcb avatar Jul 10 '22 01:07 switchupcb

I suppose you can consider this issue reproduced by me. — @Bluenix2

Thanks for confirming!

Generally speaking though, some libraries only handle this issue after Discord tells them.

https://github.com/discord/discord-api-docs/issues/108 states that "[libraries] should be aware that [they are] being rate limited on a given method, and not attempt to send any requests during a period that it should know would be immediately rate limited. Please refer to our new docs on Rate Limiting."

However, I find that most libraries (including the main one for Go - DiscordGo) only handle per-route rate limits. Handling Global Rate Limits are not required to be compliant, which is why there was a lack of documentation for that topic.

Why is that? How would you recommend I handle this instead?

Due to being unexpected. It's not acceptable for the documentation to be in a state where you have to do this. It's not acceptable on the library end because "err on the side of caution" implies that there are limits to (performance, etc). Falling back to sequential rate limiting may prevent errors, but as a user of a library I may or may not expect those errors and prefer the performance of consecutive requests.

switchupcb avatar Jul 10 '22 01:07 switchupcb

Due to being unexpected. It's not acceptable for the documentation to be in a state where you have to do this. It's not acceptable on the library end because "err on the side of caution" implies that there are limits to (performance, etc). Falling back to sequential rate limiting may prevent errors, but as a user of a library I may or may not expect those errors and prefer the performance of consecutive requests.

This is partially Discord's fault, yes, but not all the blame can be put on Discord. There are routes which do not produce ratelimit headers from what I remember, which I would advocate for being fixed. The documentation could be clarified about whether ratelimiters should treat the endpoint as unknown (sequential requests, going as safe as possible) or if it can treat the endpoint as falling under the global ratelimit (technically unlimited).

That said, the ratelimiter may also not receive headers in-case of errors and other anomalies which I don't think we can fairly blame Discord for. This code was put in place to ensure that the ratelimiter does not lock up forever when that happens. The errors of this do propagate back to the user, but the process doesn't exit and the ratelimiter needs to safe-heal for future requests.

Bluenix2 avatar Jul 10 '22 01:07 Bluenix2

This is partially Discord's fault, yes, but not all the blame can be put on Discord. There are routes which do not produce ratelimit headers from what I remember, which I would advocate for being fixed. The documentation could be clarified about whether ratelimiters should treat the endpoint as unknown (sequential requests, going as safe as possible) or if it can treat the endpoint as falling under the global ratelimit (technically unlimited). — @Bluenix2

Partial is an understatement. The main issue in this case is that these issues are unavoidable. Discord justifies the current rate limit implement with the fact that it's common in the industry. That's entirely fine. The issue lies in the actual implementation of all these limits. https://github.com/discord/discord-api-docs/pull/4694 shows how Discord expects rate limits to be handled.

For now, I think the more important part is to mostly rely on headers which can consistently tell your app how it should respond and handle a rate limit. — @shaydewael

Reliance on headers is also common in the industry, but these dynamic per-route rate limits (where a "route" is never actually defined; only implied as endpoint + method) are NOT. Neither is the failure to send those headers for certain endpoints. The issue — as you discovered — is that it's possible to hit a rate limit BEFORE sending 50 requests per second. However, rate limits can not be discovered until you actually send a request.

This gives the API consumer 2 options:

  1. Send a single request to check for a rate limit header, and if one doesn't exist then constrain that request to the Global Rate Limit.
  2. Send the maximum amount of requests in a burst and retry the ones that are rate limited after.

The whole "get 429ed and then retry" method doesn't work that well, as it seems this is incredibly prone to getting the bot in a cascading failure mode that just spams the API server. — https://github.com/discord/discord-api-docs/issues/108

In either case, it's impossible to know the rate limit prior to sending a request which is why option 1 is a valid way to handle per-route rate limits, but it still isn't acceptable as a practice. This dynamic nature is highlighted by Discord which states:

Correct—there are some routes where this isn't true. "Often" is purposefully vague to indicate that you can't rely on a single pattern for all rate limits since they aren't all calculated in the same way, and each is subject to change over time. — @shaydewael

The lack of a rate limit pattern means that there are multiple ways that you — the developer — must handle rate limits in a library. It's not a surprise that many libraries choose to ignore some of them (i.e Global Rate Limits, Emojis). This issue only occurs because of Discord's "dynamic route" implementation. There were many alternatives to this system, which isn't the main point of the post. However, some examples were giving each route a "cost" and limiting the amount a bot could spend per second; or provisioning more resources for certain endpoints. The current implementation is the easier option for Discord, but — in a similar manner to Slash Commands — has a poor execution (with regards to the first request of any route).

Rapptz was right.

switchupcb avatar Jul 11 '22 16:07 switchupcb

We are giving that decision (to send one or multiple requests at a route's first request) to the end user of our library through the use of a default bucket.

switchupcb avatar Jul 11 '22 17:07 switchupcb

I've been trying out some stuff and I now have a different implementation as a prototype:

class GlobalRatelimit:
    def __init__(self) -> None:
        self._lock = anyio.Lock()

        self._tokens = deque(maxlen=50)

    async def wait(self) -> None:
        async with self._lock:
            now = time.perf_counter()

            if self._tokens:
                await anyio.sleep(self._tokens[0] + 1 - now)

            self._tokens.append(time.perf_counter())

I was describing this behaviour above:

It doesn't matter when Discord's actual buckets really reset. As long as you have a type of timed semaphore which guarantee that there is not more than 50 requests per second it doesn't matter much when the window opens/closes. No matter the drift in when you open/close the window, you should not run exceed Discord's ratelimits

I understand now that this is what I was describing, but had not implemented. With the previous implementation it is possible to send 50 requests at the end of our window (but the opening of Discord's), then 50 requests at the start of the new window (but the end of Discord's). This results in 100 requests being interpreted by Discord.

Unfortunately, I still got one or two requests which hit the global limit with this new implementation - every once in a while. I had to create another test which did hit the global limit, but still more infrequently than earlier. These handful of violations of the ratelimit must have been to do with the delay between me and Discord.


Partial is an understatement.

The sentence you quoted was referring to multiple paragraphs. Sorry for the confusion. My point was that I fallback to single-queued requests for two reasons, only one of which can be blamed on Discord.

The issue — as you discovered — is that it's possible to hit a rate limit BEFORE sending 50 requests per second

I am gonna assume "you" may have referred to me. Yes, the reason it is possible to hit the ratelimit before sending 50 requests per second is because of the route-specific ratelimits. The route-specific ratelimits are more commonly hit than the global ratelimit. You could almost completely forget about the global ratelimit as it so rarely comes into play.

This gives the API consumer 2 options:

  1. Send a single request to check for a rate limit header, and if one doesn't exist then constrain that request to the Global Rate Limit.
  2. Send the maximum amount of requests in a burst and retry the ones that are rate limited after.

Yes the first one is generally speaking the most considerate. Send requests one-by-one until you have the ratelimiting information you need to be able to safely send concurrent requests. Requests sent one-by-one is how most libraries have worked for years as far as I know - taking the courage to send concurrent requests, is in my opinion, something extra/additional/bonus.

The whole "get 429ed and then retry" method doesn't work that well, as it seems this is incredibly prone to getting the bot in a cascading failure mode that just spams the API server.

In either case, it's impossible to know the rate limit prior to sending a request which is why option 1 is a valid way to handle per-route rate limits, but it still isn't acceptable as a practice.

That quote was token from another time? At the time that this issue was created, as commented and discussed, many libraries sent requests without any form of ratelimiting or locking, apart from sleeping when they got a 429 response. I am confused as to why bring this up.

There were many alternatives to this system, which isn't the main point of the post. However, some examples were giving each route a "cost" and limiting the amount a bot could spend per second; or provisioning more resources for certain endpoints.

What is done right now can be considered a variant of this. Different routes have different ratelimits matching their costs and chance of abuse, That said, had what you mentioned been implemented we would be back at square one of not knowing the cost of all routes since Discord would want to change it dynamically as they experience different types of abuse.

I do want to point out that - in the reality of the situation - Discord doesn't really have much to gain by making it easier to exceed the maximum possible requests. Removing the dynamicness of the system with how each route has a chance to get some special treatment means that instead of increasing ratelimits for specific usages, Discord now needs to punish everyone. I suppose it is similar to how rules are commonly very vague; if a rule was too specific, it would be too easy to skirt the rules. Not to say that I agree with all of this, but it is what it is.


I was under the impression that this was meant to be constructive in solving some type of issue. Unfortunately, I now realize that I have wasted my time and this is just a place to complain - no one to blame but me. I will most likely not respond again.

Bluenix2 avatar Jul 11 '22 19:07 Bluenix2

I was under the impression that this was meant to be constructive in solving some type of issue. Unfortunately, I now realize that I have wasted my time and this is just a place to complain - no one to blame but me. I will most likely not respond again. — @Bluenix2

Lol. https://github.com/discord/discord-api-docs/issues/5144#issuecomment-1179633775 contains a working implementation of a Global Rate Limit Handler if you can read Go. However I'm about to add the per-route rate limit handling (along with priority requests on 429) which may make it slightly more complex when I commit. I try to keep things as simple as possible.

However, I still need to finish the Discord Documentation PR before this issue is officially "solved".

switchupcb avatar Jul 11 '22 23:07 switchupcb

In addition to the above, https://discord.com/developers/docs/topics/gateway#rate-limiting can also be clarified.

Clients are allowed to send 120 gateway commands every 60 seconds, meaning you can send an average of 2 commands per second. Clients also have a limit of concurrent Identify requests allowed per 5 seconds. If you hit this limit, the Gateway will respond with an Opcode 9 Invalid Session.

Emphasis is placed on the fact that we can send an average of 2 commands per second; which would indicate a leaky bucket rate limit algorithm for Gateway commands. However, this is contradicted by the following sentence that states that clients can send 5 Identify requests per second. Either an exemption is being made to a leaky token bucket (that allows 2 commands/s) or this is another fixed window rate limit with an additional rate limit on the Identify commands.

In addition, the following info is NOT included in the rate limit section for Gateway Rate Limits, but is included around Identify.

Clients are limited to 1000 IDENTIFY calls to the websocket in a 24-hour period. This limit is global and across all shards, but does not include RESUME calls.

switchupcb avatar Jul 21 '22 02:07 switchupcb

However, this is contradicted by the following sentence that states that clients can send 5 Identify requests per second

It's 1/5s (for bots in <150k guilds), not 5/1s, and is global to the bot, not per gateway connection

Zoddo avatar Jul 21 '22 22:07 Zoddo

Thanks for the correction @Zoddo.

switchupcb avatar Jul 21 '22 22:07 switchupcb

These are not actually described in the documentation but I have been able to deduce the types of rate limits Discord has: Global (Requests), Per Route (Requests), Per Resource (Requests), Per Resource Per Routes (Emoji Requests), Global (Gateway), Identify (Gateway). — https://github.com/switchupcb/disgo/pull/14#issuecomment-1191998174

Based on https://github.com/mennanov/limiters: Global Rate Limit: Fixed Window (50 rps) Per Route: Token Bucket Per Resource: Token Bucket Emoji: (Derived Resource Rate Limit aka Per Resource Per Routes) Token Bucket (4 rps per channel) Gateway: Fixed Window (120 rps) Identify: Fixed Window (.2 rps)

This source from Discord says they do not actually know (after reviewing this conclusion).

image

I will create the PR outline and allow an employee to finish the details such as to not provide misinformation.

switchupcb avatar Jul 21 '22 22:07 switchupcb

@shaydewael Do you have a list of all the routes with top-level resources which use per-resource rate limits? X-RateLimit-Scope is unfortunately only returned on 429'd requests.

switchupcb avatar Jul 21 '22 23:07 switchupcb

The issue is that there is no guarantee that top-level routes (which I assume are routes with guild, channel, webhook parameters coming first) are actually resource routes. This interpretation itself could be entirely wrong.

During calculation, per-route rate limits often account for top-level resources within the path using an identifier

switchupcb avatar Jul 22 '22 01:07 switchupcb

This statement is rather ambiguous because "often" implies that it does NOT apply to every request containing a top-level resource; or started by a top-level resource. The problem is that this statement is also the only sentence regarding Discord's Per Resource Rate Limits; barring X-RateLimit-Scope which only indicates that a 429'd request is a Per Resource Rate Limit. There is also no clarification why the X-RateLimit-Scope calls Per Resource rate limits shared, which could imply that multiple users can trigger than rather than just the bot. — https://github.com/switchupcb/disgo/issues/22

switchupcb avatar Jul 25 '22 07:07 switchupcb

Given that @jhgg is at least a Senior Staff Engineer at Discord, and also the initial announcer of Discord Rate Limits, he may also be able to confirm the question in https://github.com/discord/discord-api-docs/issues/5144#issuecomment-1192012368.

switchupcb avatar Jul 25 '22 07:07 switchupcb

@advaith1 Do you have any other information about classifying endpoints to per-resource rate limits prior to sending a request?

switchupcb avatar Aug 06 '22 22:08 switchupcb

The following response was given. image

As a result, the pull request is being made to address this issue can be resumed upon confirmation (via https://github.com/switchupcb/disgo/pull/19 logs).

switchupcb avatar Aug 10 '22 04:08 switchupcb

More importantly, encountering a 429 as a result of rate-limit bucket changes are inevitable due to Discord's implementation of rate-limit bucket discovery.

This should be fixed.

erkinalp avatar Sep 10 '22 07:09 erkinalp

Im not sure exactly what this issue is about anymore so Im closing it. Please feel free to open a new issue if necessary, but please dont open new issues asking that we document per-resource limits

yonilerner avatar Oct 04 '22 00:10 yonilerner