disnake
disnake copied to clipboard
feat: pass deserialization exceptions in gateway event handler to loop exception handler
Summary
Changes DiscordWebSocket.received_message to pass payload deserialization exceptions to the running loop's exception handler instead of letting them propagate up and crash the shard/bot.
This prevents bots from just outright crashing if an event couldn't be parsed; Ideally, deserializing the payloads should always work, but history has shown that unannounced API changes unfortunately happen from time to time (even if accidental, like text-in-voice in December :^) ), and the library can't always perfectly account for that.
note that this is really more of a proposal, feel free to suggest alternative implementations or ideas
Checklist
- [x] If code changes were made, then they have been tested
- [ ] I have updated the documentation to reflect the changes
- [x] I have formatted the code properly by running
task lint - [x] I have type-checked the code by running
task pyright
- [ ] This PR fixes an issue
- [x] This PR adds something new (e.g. new method or parameters)
- [x] This PR is a breaking change (e.g. methods or parameters removed/renamed)
- [ ] This PR is not a code change (e.g. documentation, README, ...)
Not exactly sure which error handler this should redirect to, eg should an on_error be dispatched, or if this should have an opt-out option as someone might have code reliant on the old system.
Not exactly sure which error handler this should redirect to, eg should an on_error be dispatched,
Dispatching on_error is difficult since we'd need to await it, blocking further gateway event handling. Running it in a separate task isn't an option either since you wouldn't be able to access the exception in the task using sys.exc_info like usual.
or if this should have an opt-out option as someone might have code reliant on the old system.
I can't really think of anything that would break due to this change, but I'm open to marking it as breaking just in case
I ended up causing an issue where parse_message_create event could not deseralise the message. This would cause every single message to cause an exception, and by default I don't think that's desired behavior.
That still seems better than outright crashing the bot on the first exception, you're not going to have a good time either way :p
What we might be able to do is add an argument for controlling the behavior here: default to off and opt-in, and in the future we can default this to True. Notate in the documentation this is not finalised and could change between releases.
Adding a toggle for this sounds good, though it should be opt-out (mainly for lib dev, I suppose?) instead of opt-in, since the goal of this was to have most/all clients protected from unannounced breaking API changes (like the automod change a few days ago) or other mishaps.
*It gets worse in that some exceptions in an event are fine to continue but others are not.
Could you elaborate on that? Not quite sure which exceptions this is about.
I still don't see a toggle :P
*It gets worse in that some exceptions in an event are fine to continue but others are not.
Could you elaborate on that? Not quite sure which exceptions this is about.
This is about some errors in some bots without intents, mostly issues that @Skelmis has discovered: we shouldn't hide that messages will error their creation, but I'm not entirely sure. It is possible to glitch the bot out in a way that will error on every message creation, and at that point I believe the bot is unstable to continue to operate. However, your mileage may vary, and this could be a beneficial addition. We could add it, make it opt out by default and change it to opt in if we get reports (but honestly I don't think we will).
I personally feel it's better to raise an error on every exception vs crashing the shard/bot. This is because, while it's a pain to have that many exceptions it would still allow other aspects of the bot to function vs outright crashing.
N.b. this also comes into affect in environments where a developer has say, automated docker restarts. Personally I use them for if the server goes down n comes back without me noticing, but in the case of an exception resulting in a process restart that would very quickly eat into the session start limit which is not something good.
I personally feel it's better to raise an error on every exception vs crashing the shard/bot. This is because, while it's a pain to have that many exceptions it would still allow other aspects of the bot to function vs outright crashing.
That makes sense, but I'm concerned some developers might ignore or miss them if they have other logs enabled. However, these concerns could be alleviated by adding a default logging configuration that doesn't require the user to set their own loggers if they want to use the default.
N.b. this also comes into affect in environments where a developer has say, automated docker restarts. Personally I use them for if the server goes down n comes back without me noticing, but in the case of an exception resulting in a process restart that would very quickly eat into the session start limit which is not something good.
Exponential back-off is your friend!
You could possibly send the exceptions to stderr rather then on_error? To ensure they show up.
Although possibly a new event could be warranted, on_critical_error where: This event is dispatched for errors originating outside of the control of the library.
..note Ideally you should never need this event as it is reserved for serious issues such as API regression or plausible process crashes.
And exponential backoff how? It's a full container restart. What your suggesting appears to be something I'd need to implement in process
You could possibly send the exceptions to stderr rather then on_error? To ensure they show up.
That's fair, we need to redo some of our logging and printing anyways.
Although possibly a new event could be warranted,
on_critical_errorwhere: This event is dispatched for errors originating outside of the control of the library.
That might be useful, but perhaps on_gateway_error is a better name. @shiftinv thoughts?
..note Ideally you should never need this event as it is reserved for serious issues such as API regression or plausible process crashes.
And exponential backoff how? It's a full container restart. What your suggesting appears to be something I'd need to implement in process
This was a mention about your set-up, its not a great idea to always restart on an error ~~although I can't really comment much on that since I used to do that.~~
I think it's better to catch critical errors instead of crashing the client, because if some developers encounter a regularly appearing critical error they won't be able to ignore it unless they mokeypatch the lib internals. Speaking of an event for this - yes, we should provide that, because this will allow bot developers to decide whether they want to do some specific actions on each critical error (e.g. logging out or modifying the event payload).
You could possibly send the exceptions to stderr rather then on_error? To ensure they show up.
That's fair, we need to redo some of our logging and printing anyways.
Although possibly a new event could be warranted,
on_critical_errorwhere: This event is dispatched for errors originating outside of the control of the library.That might be useful, but perhaps
on_gateway_erroris a better name. @shiftinv thoughts?..note Ideally you should never need this event as it is reserved for serious issues such as API regression or plausible process crashes. And exponential backoff how? It's a full container restart. What your suggesting appears to be something I'd need to implement in process
This was a mention about your set-up, its not a great idea to always restart on an error ~although I can't really comment much on that since I used to do that.~
Technically our current restart policy is None due to the issue we've faced, but the point stands. Not all users will run a perfect setup so ideally the library would at-least provide some form of assistance in that regard.
I think it's better to catch critical errors instead of crashing the client, because if some developers encounter a regularly appearing critical error they won't be able to ignore it unless they mokeypatch the lib internals. Speaking of an event for this - yes, we should provide that, because this will allow bot developers to decide whether they want to do some specific actions on each critical error (e.g. logging out or modifying the event payload).
I agree with this, monkey patching library internals just to try stop our bot crashing wasn't exactly a highlight of the week.
As to what kind of event, I still think it should be a new event which is dispatched (and prints to stderr by default) but after that I feel that would be enough from the libraries side in terms of prevention. Anything further and its probably just a case of the end developer providing logs to Disnake in order to fix upstream if its a Discord issue, otherwise the developer can choose to handle it as they please
Although possibly a new event could be warranted,
on_critical_errorwhere: This event is dispatched for errors originating outside of the control of the library.That might be useful, but perhaps
on_gateway_erroris a better name. @shiftinv thoughts?
The loop's default exception handler would always call logger.error, so it'd show up even if logging isn't configured, but a separate event is still a good idea.
It's implemented in https://github.com/DisnakeDev/disnake/pull/401/commits/9fd26d24f3559ffd830bf8699e29030a133347da, still need to make the entire thing configurable though.
I still don't see a toggle :P
@onerandomusername check again c: This PR should now be ready again, pretty sure I incorporated all suggestions.