discord.py Bot Does Not Handle Rate Limit on Retrieving Message History

Summary

The bot does not automatically sleep on 429 errors if getting message history for lots of channels.

Reproduction Steps

My bot gets a list of all guilds it's part of and uses asyncio.gather to start a function that scrapes every message in every text channel of every guild that it's part of. This causes it to run 100+ functions asyncronously that request the messages of

Expected Results

On a rate limit, the bot will sleep and re-try.

Actual Results

[2020-09-09 15:38:23,805] {bot::on_error} {None::None::None::None} (ERROR): Error occurred in event handler on_ready
Traceback (most recent call last):
  File "/Users/<user>/.local/share/virtualenvs/PokestarBot-8lQ-e5KE/lib/python3.8/site-packages/discord/client.py", line 312, in _run_event
    await coro(*args, **kwargs)
  File "/Users/<user>/PycharmProjects/PokestarBot/bot_data/bot.py", line 184, in on_ready
    await self.get_all_stats()
  File "/Users/<user>/PycharmProjects/PokestarBot/bot_data/bot.py", line 475, in get_all_stats
    await asyncio.gather(*[self.get_guild_stats(guild) for guild in self.guilds])
  File "/Users/<user>/PycharmProjects/PokestarBot/bot_data/bot.py", line 491, in get_guild_stats
    await asyncio.gather(*[self.get_channel_stats(channel) for num, channel in enumerate(channels)])
  File "/Users/<user>/PycharmProjects/PokestarBot/bot_data/bot.py", line 494, in get_channel_stats
    async for message in channel.history(limit=None):
  File "/Users/<user>/.local/share/virtualenvs/PokestarBot-8lQ-e5KE/lib/python3.8/site-packages/discord/iterators.py", line 86, in __anext__
    msg = await self.next()
  File "/Users/<user>/.local/share/virtualenvs/PokestarBot-8lQ-e5KE/lib/python3.8/site-packages/discord/iterators.py", line 260, in next
    await self.fill_messages()
  File "/Users/<user>/.local/share/virtualenvs/PokestarBot-8lQ-e5KE/lib/python3.8/site-packages/discord/iterators.py", line 306, in fill_messages
    data = await self._retrieve_messages(self.retrieve)
  File "/Users/<user>/.local/share/virtualenvs/PokestarBot-8lQ-e5KE/lib/python3.8/site-packages/discord/iterators.py", line 326, in _retrieve_messages_before_strategy
    data = await self.logs_from(self.channel.id, retrieve, before=before)
  File "/Users/<user>/.local/share/virtualenvs/PokestarBot-8lQ-e5KE/lib/python3.8/site-packages/discord/http.py", line 255, in request
    raise HTTPException(r, data)
discord.errors.HTTPException: 429 Too Many Requests (error code: 0): You are being rate limited.

Checklist

[x] I have searched the open issues for duplicates.
[x] I have shown the entire traceback, if possible.
[x] I have removed my token from display, if visible.

System Information

Python v3.8.5-final
discord.py v1.4.1-final
aiohttp v3.6.2
system info: Darwin 19.6.0 Darwin Kernel Version 19.6.0: Thu Jun 18 20:49:00 PDT 2020; root:xnu-6153.141.1~1/RELEASE_X86_64

Sep 09 '20 19:09 PythonCoderAS

Judging by the traceback and where the exception is being raised, this should only occur if the request calling code has run out of retries. A retry happens in the following scenarios:

The request failed with a 500 status code.
The request is attempting again after being rate limited.
There was a ConnectionReset exception raised (your connection got terminated mid-request).

However I'm unsure how this happened without actual logs to help pin point the issue. The rate limit code should and does sleep in these cases and they're wrapped in a lock so even extreme forms of concurrency shouldn't affect it.

Sep 09 '20 23:09 Rapptz

@Rapptz I don't know how to get any more logs, since my program doesn't write any logs in-between. Furthermore, I have instead re-wrote my program to use a Queue and 10 concurrent coroutines to fetch from said Queue, and this seems to not lead to any more problems. I think the only way to reproduce is to try again, but perhaps maybe make a Guild with hundreds of channels and fill them with about 100 messages each for testing purposes.

Sep 10 '20 12:09 PythonCoderAS

@Rapptz: I've looked into this issue and I believe I have more information for you, as your label requested.

It looks like it may be an issue with how Discord.py is bucketing requests for rate limiting. The bucketing logic does not seem to match was the way the Discord servers are imposing rate limits anymore.

Here is an excerpt of HTTP response logs from the discord.http module from the beginning of a session that does highly concurrent message history fetching, like PythonCodeAS described:

2020-10-09 15:23:16,689 DEBUG:discord.http -- GET https://discord.com/api/v7/users/@me with None has returned 200
2020-10-09 15:23:16,730 DEBUG:discord.http -- GET https://discord.com/api/v7/gateway with None has returned 200
2020-10-09 15:23:17,385 DEBUG:discord.http -- GET https://discord.com/api/v7/channels/225485111359963136/messages with None has returned 200
2020-10-09 15:23:17,397 DEBUG:discord.http -- GET https://discord.com/api/v7/channels/127129788304326656/messages with None has returned 200
2020-10-09 15:23:17,411 DEBUG:discord.http -- GET https://discord.com/api/v7/channels/653973560125161472/messages with None has returned 200
2020-10-09 15:23:17,414 DEBUG:discord.http -- GET https://discord.com/api/v7/channels/217639319177789441/messages with None has returned 200
2020-10-09 15:23:17,425 DEBUG:discord.http -- GET https://discord.com/api/v7/channels/734666631719944223/messages with None has returned 200
2020-10-09 15:23:17,426 DEBUG:discord.http -- GET https://discord.com/api/v7/channels/758753655057940570/messages with None has returned 200
2020-10-09 15:23:17,428 DEBUG:discord.http -- GET https://discord.com/api/v7/channels/734666345588981790/messages with None has returned 200
2020-10-09 15:23:17,432 DEBUG:discord.http -- GET https://discord.com/api/v7/channels/330093801500704770/messages with None has returned 200
2020-10-09 15:23:17,436 DEBUG:discord.http -- GET https://discord.com/api/v7/channels/496848480233455618/messages with None has returned 200
2020-10-09 15:23:17,438 DEBUG:discord.http -- GET https://discord.com/api/v7/channels/714221806738997382/messages with None has returned 200
2020-10-09 15:23:17,439 DEBUG:discord.http -- GET https://discord.com/api/v7/channels/466278256434479104/messages with None has returned 200
2020-10-09 15:23:17,441 DEBUG:discord.http -- GET https://discord.com/api/v7/channels/218808598237544449/messages with None has returned 200
2020-10-09 15:23:17,446 DEBUG:discord.http -- GET https://discord.com/api/v7/channels/496848688879108096/messages with None has returned 200
2020-10-09 15:23:17,449 DEBUG:discord.http -- GET https://discord.com/api/v7/channels/265843339197153290/messages with None has returned 429
2020-10-09 15:23:17,449 DEBUG:discord.http -- GET https://discord.com/api/v7/channels/322768388734582785/messages with None has returned 429
2020-10-09 15:23:17,451 DEBUG:discord.http -- GET https://discord.com/api/v7/channels/244238416248569857/messages with None has returned 429
2020-10-09 15:23:17,452 DEBUG:discord.http -- GET https://discord.com/api/v7/channels/244242213138071552/messages with None has returned 429
2020-10-09 15:23:17,452 DEBUG:discord.http -- GET https://discord.com/api/v7/channels/407326714356826112/messages with None has returned 429
2020-10-09 15:23:17,452 DEBUG:discord.http -- GET https://discord.com/api/v7/channels/623367334220005387/messages with None has returned 429
2020-10-09 15:23:17,452 DEBUG:discord.http -- GET https://discord.com/api/v7/channels/368932846083112970/messages with None has returned 429
2020-10-09 15:23:17,453 DEBUG:discord.http -- GET https://discord.com/api/v7/channels/757602062241497189/messages with None has returned 429
2020-10-09 15:23:17,453 DEBUG:discord.http -- GET https://discord.com/api/v7/channels/390076684780568576/messages with None has returned 429
2020-10-09 15:23:17,453 DEBUG:discord.http -- GET https://discord.com/api/v7/channels/233619405223755777/messages with None has returned 429

Notice how HTTP 429 responses are being given for channels that have never seen an HTTP 200 response yet. It appears that Discord's current behavior is to apply a rate limit across all channels in a guild.

However, I observed that the Route class in discord.http assigns request rate-limiting buckets by channel ID, and the HTTPClient.logs_from method provides a channel ID such that the .../messages request is bucketed by channel for rate limiting. Since the Discord servers do not appear to be assigning a separate rate limit per channel anymore (or at least they have a guild-wide rate limit that is more restrictive), it would seem that this rate limit bucketing logic may need revising.

One last piece of information to note: I added some debug logging to see the actual HTTP 429 responses, and I noticed that the 429 responses' JSON bodies had global set to False. So whatever we're running into is not described by the server as a global rate limit, even if it is less granular than a channel-level rate limit.

Finally, it may be an unduly broad way of addressing this problem, but I tested removing channel IDs from consideration in the Route.bucket property method, so that rate buckets could never be more granular than guild level, and while there were still occasional 429s, they were relatively few.

Oct 09 '20 19:10 joell

As a follow up, it looks like the latest documentation on rate limits appears to hint at this:

Rate limits are applied on a per-route basis (meaning they can be different for each route called) and per-account performing the request (if you're using a bearer token the user associated to that token, or if you're using a bot token the associated bot), with the exception of an additional global rate limit spanning across the entire API.

(emphasis added is my own)

What we might be hitting here is the per-account rate limit, which is separate from the route limits and the global limit.

Additionally, the documentation notes:

"Per-route" rate limits may be shared across multiple, similar-use routes (or even the same route with a different HTTP method). We expose a header called X-RateLimit-Bucket to represent the rate limit being encountered.

It's possible that they might be rate limiting all channel .../messages requests in a single bucket. However, I did not see any X-RateLimit- headers in the HTTP responses Discord.py was receiving. Instead I saw only a Retry-After header.

Is it possible that these new rate limiting headers are only introduced in a newer version of that API than Discord.py is using?

Oct 09 '20 19:10 joell

@joell This is something I'm already aware about, it's the rate limit bucket header that I specifically requested for with other library developers a few years back. However, the library does not use it at the moment due to the existence of sub rate limits becoming more common and it being buggy to implement for those sub rate limits (the hash changes and the rate limit is different despite the route being the same).

What you're referring to as "per-account rate limit" is a concept that only exists in OAuth2, which the library does not use.

All in all, there is no new information here. The rate limit handling should have stopped this (and indeed, when testing locally it does).

Oct 10 '20 09:10 Rapptz

The rate limit handling has been significantly rewritten as of a few commits ago. I'm unsure if this issue will pop up again but if there are any issues feel free to open a new one.

Aug 17 '22 14:08 Rapptz