Discord.Net
Discord.Net copied to clipboard
[Bug]: Gateway infinite hang after a while.
Check The Docs
- [X] I double checked the docs and couldn't find any useful information.
Verify Issue Source
- [X] I verified the issue was caused by Discord.Net.
Check your intents
- [x] I double checked that I have the required intents.
Description
After a while into the bot's runtime, it could be around 5 hours, or even 10 days, the discord gateway will straight up just hang infinitely, with the message of "Disconnecting...":
This isn't always preceded with a blocking issue. I want to know if you have ever had this issue, this issue has been discovered yet, or if this is possibly just a .NET issue.
Bot is running on net48, LangVersion 10, on a VM with 8 vCPUs and 8 GiB of memory, on Windows Server 2019, 400MiB/s DOWN and 40MiB/s UP Sharding is disabled.
Version
v3.0.0-dev-dev
Working Version
No response
Logs
[2022-02-21T15:43:42.5093Z][5018][0014][56700.3657842][win32nt-amd64][4.0.30319.42000][1.0.8079.2135][Release][10.128.29.28][JFK-01-DApp181][dapp181-dec.distrubuted.jfk-01-us-east-01.mfdlabs.local][bot][INFO] Renewing vault client's token, '_____'
[2022-02-21T15:53:11.6539Z][5018][005f][57269.4995718][win32nt-amd64][4.0.30319.42000][1.0.8079.2135][Release][10.128.29.28][JFK-01-DApp181][dapp181-dec.distrubuted.jfk-01-us-east-01.mfdlabs.local][bot][INFO] DiscordInternal-INFO-Gateway: Disconnecting
[2022-02-21T15:53:11.6559Z][5018][005f][57269.5013986][win32nt-amd64][4.0.30319.42000][1.0.8079.2135][Release][10.128.29.28][JFK-01-DApp181][dapp181-dec.distrubuted.jfk-01-us-east-01.mfdlabs.local][bot][INFO] DiscordInternal-INFO-Gateway: Disconnected
[2022-02-21T15:53:12.6609Z][5018][0014][57270.5064444][win32nt-amd64][4.0.30319.42000][1.0.8079.2135][Release][10.128.29.28][JFK-01-DApp181][dapp181-dec.distrubuted.jfk-01-us-east-01.mfdlabs.local][bot][INFO] DiscordInternal-INFO-Gateway: Connecting
[2022-02-21T15:53:15.9079Z][5018][006a][57273.7533737][win32nt-amd64][4.0.30319.42000][1.0.8079.2135][Release][10.128.29.28][JFK-01-DApp181][dapp181-dec.distrubuted.jfk-01-us-east-01.mfdlabs.local][bot][WARNING] DiscordInternal-WARNING-Gateway: A MessageReceived handler is blocking the gateway task.
[2022-02-21T15:53:42.6645Z][5018][0042][57300.5093878][win32nt-amd64][4.0.30319.42000][1.0.8079.2135][Release][10.128.29.28][JFK-01-DApp181][dapp181-dec.distrubuted.jfk-01-us-east-01.mfdlabs.local][bot][INFO] DiscordInternal-INFO-Gateway: Disconnecting
Sample
No response
If needed I can supply my exact Discord.Net I am using.
The main issue with this is that it's not consistent
A longer log trace should be provided as to why the client disconnects here. I assume this is because of a regular reconnection? If not, please include the disconnection reason.
In any case, please also cover your messagereceived handler, as this is what is holding up the gateway and ultimately locking it
@Rozen4334 I have stated that it's doesn't happen just because of the message received handler, I should have also said that there's no error for this. I believe I can enable better logging with this, but you'll have to wait for a while to receive the newer verbose exception.
A follow up to the last message, the deployment that has debug logging enabled is deployed, and I will report back here when I get the exception.
@Rozen4334 I am back, and it happened because of a skipped hearbeat. And it doesn't recover
Try using 3.3.2, we made some changes to the internals within the 3.x> versions
@quinchs I think it may have been the thing I dismissed :/. Will do some staging with the change to fix it to determine if it is.
Experiencing the same issue, though I don't seem to be able to keep connection longer than 24 hours.
19:35:59 Discord Discord.Net v3.4.1 (API v9)
19:35:59 Gateway Connecting
19:36:00 Gateway Connected
19:36:02 Gateway Ready
21:19:49 Gateway Discord.WebSocket.GatewayReconnectException: Server missed last heartbeat
at Discord.ConnectionManager.<>c__DisplayClass29_0.<<StartAsync>b__0>d.MoveNext()
21:19:49 Gateway Disconnecting
21:19:49 Gateway Disconnected
21:19:50 Gateway Connecting
21:20:05 Gateway Connected
21:20:05 Gateway Resumed previous session
21:21:28 Gateway Discord.WebSocket.GatewayReconnectException: Server missed last heartbeat
at Discord.ConnectionManager.<>c__DisplayClass29_0.<<StartAsync>b__0>d.MoveNext()
21:21:28 Gateway Disconnecting
21:21:28 Gateway Disconnected
21:21:29 Gateway Connecting
21:21:29 Gateway Connected
21:21:29 Gateway Resumed previous session
22:21:42 Gateway Discord.WebSocket.GatewayReconnectException: Server requested a reconnect
at Discord.ConnectionManager.<>c__DisplayClass29_0.<<StartAsync>b__0>d.MoveNext()
22:21:42 Gateway Disconnecting
22:21:42 Gateway Disconnected
22:21:43 Gateway Connecting
22:21:44 Gateway Connected
22:21:44 Gateway Resumed previous session
00:15:01 One or more errors occurred. (The server responded with error 500: 500: Internal Server Error)
at System.Threading.Tasks.Task.ThrowIfExceptional(Boolean includeTaskCanceledExceptions)
at System.Threading.Tasks.Task.Wait(Int32 millisecondsTimeout, CancellationToken cancellationToken)
at System.Threading.Tasks.Task.Wait()
at Program.<>c__DisplayClass0_0.<<Main>$>b__28() in Program.cs:line 55
at Program.<>c__DisplayClass0_0.<<Main>$>b__31() in Program.cs:line 57
at A.E(Action t) in Program.cs:line 241
00:15:51 Gateway Discord.WebSocket.GatewayReconnectException: Server missed last heartbeat
at Discord.ConnectionManager.<>c__DisplayClass29_0.<<StartAsync>b__0>d.MoveNext()
00:15:51 Gateway Disconnecting
00:15:51 Gateway Disconnected
00:15:52 Gateway Connecting
00:17:32 Gateway Disconnecting
00:17:32 Gateway Disconnected
It should be noted that the DiscordSocketClient
in question doesn't actually appear to fully disconnect. While the bot account does go offline and cannot respond to slash commands, it is still able to edit previous messages.
I managed to keep it up for around 7 days and 9 hours until it eventually just hung
I have slowly stripped away my testing bot to the following code:
using(DiscordSocketClient dc=new(new(){GatewayIntents=(GatewayIntents)3})){
dc.Log+=lm=>Console.WriteLine(lm);
await dc.LoginAsync((TokenType)1,token);
await dc.StartAsync();
await Task.Delay(-1);
}
I still receive this error. The gap between the final Connecting
and Disconnecting
is consistently 90-120 seconds.
Any self-inflicted action appears to work properly (including sending messages). Any outside event/trigger is lost.
I will do some more investigating on this. If possible could you find out what version post 3.0 can stay up?
@quinchs we don't use the Nuget version, we build from source so our version would be v3.0.0-dev-dev
Using v3.3.1, there haven't been any problems for over 36 hours. I'll keep it running overnight, but it appears to be a regression from v3.4 in some way...
We've had ours up for about 155 hours, so we really have no clue what the problem is. We will try determine the commit we have up to.
Edit: Well according to the PR that upgraded it, the version is 3.4.1
So close... lasted 43 hours. Cleansed output:
07:51:15 Discord Discord.Net v3.3.1 (API v9)
07:51:15 Gateway Connecting
07:51:16 Gateway Connected
07:51:16 Gateway Ready
... 41 hours later ...
01:10:06 Gateway Discord.WebSocket.GatewayReconnectException: Server missed last heartbeat
01:10:29 Gateway System.Net.Http.HttpRequestException: Name or service not known (discord.com:443)
01:10:40 Gateway Resumed previous session
03:52:16 Gateway Discord.WebSocket.GatewayReconnectException: Server missed last heartbeat
at Discord.ConnectionManager.<>c__DisplayClass29_0.<<StartAsync>b__0>d.MoveNext()
03:52:16 Gateway Disconnecting
03:52:17 Gateway Disconnected
03:52:18 Gateway Connecting
03:53:58 Gateway Disconnecting
03:53:58 Gateway Disconnected
Still using the code from my previous comment. Moving to v3.2.1 to test and see if it works. It might also be worth mentioning that DSharpPlus also has lots of heartbeat failures, but never at the same time as any Discord.Net errors.
Edit: v3.2.1 lasted 17 hours; reverting to v3.1.0
One I merge in #2212 I will be able to create an override for you all to diagnose/attemt to fix this as I cant reproduce this locally. All my bots have been running for >1 month
We've had ours up for about 155 hours, so we really have no clue what the problem is. We will try determine the commit we have up to.
Edit: Well according to the PR that upgraded it, the version is 3.4.1
@quinchs I work with @nkpetko. The uptime mentioned above is still increasing and we haven't had an issue like this since, we are wondering if v3.4.1 was truly the fix of it. What I want to extend this issues despite it not being related, but is there anyway when using MessageReferences to somehow determine if the message that is being referenced is deleted or not?
- Jakob
is there anyway when using MessageReferences to somehow determine if the message that is being referenced is deleted or not?
You can attempt to fetch the message using the rest client, if it returns null then it doesn't exist anymore. Are you sending a message with a reference or checking a pre-existing message?
is there anyway when using MessageReferences to somehow determine if the message that is being referenced is deleted or not?
You can attempt to fetch the message using the rest client, if it returns null then it doesn't exist anymore. Are you sending a message with a reference or checking a pre-existing message?
We were thinking of doing the rest client part, but were worried about speed. What we do right now is just send it with the message reference it will throw if the message doesn’t exist.
Facing a similar issue! By that, I mean D.NET not receiving certain events ( Notably role and guilduser updates ) after being up for an extended period of time. Might be related
@jvalara @nkpetko Could you attempt to run a fix branch in your code and let me know how it goes? thanks.
Any news? Seems to be affecting bot's ability to receive interaction events as well
@quinchs Sorry for the late reply, we've been backed with work. We'll take a look at it today and get back to you when it decides to have issues.
@quinchs I don't know anymore, it's decided to fix itself. We recently migrated to the NuGet package, from our own built source which posed no issues.
We've seen 100% uptime across the board with zero fatal alerts fired since we last deployed 2022.04.02-00.54.27_master_95352c9 to our latest release 2022.07.01-20.39.46_master_3de09e8.
I will report back again if we see any difference in this.
I've just got this error on v3.7.2 with nothing new in the stack trace that hasn't already been provided.
@quinchs I don't know anymore, it's decided to fix itself. We recently migrated to the NuGet package, from our own built source which posed no issues.
We've seen 100% uptime across the board with zero fatal alerts fired since we last deployed 2022.04.02-00.54.27_master_95352c9 to our latest release 2022.07.01-20.39.46_master_3de09e8.
I will report back again if we see any difference in this.
I stand corrected, in the last 2 months there was 42 fatal results from health checkers reporting it being non-accessible but with the process still being open. Crash reporters (they don't only just catch crashes) reported failures to connect, which the hang detecter followed up with thread blocking (as in it just blocks a single thread forever and never retries).
Also keep in mind that the bot back in that example was only at around 65 guilds. It is now at over 400.
@quinchs im test running the fix branch and for now nothing has changed, after roughly 4 hours it deadlocks.
but what i've noticed it writes A MessageReceived handler is blocking the gateway task.
after every Disconnecting
caused by a Discord.WebSocket.GatewayReconnectException: Server missed last heartbeat
also it does not write a A MessageReceived handler is blocking the gateway task.
when disconnecting because of a Discord.WebSocket.GatewayReconnectException: Server requested a reconnect
A MessageReceived handler is blocking the gateway task
implies your code for a message handler is blocking the socket gateway code. Make sure that your handlers don't block if possible
A MessageReceived handler is blocking the gateway task
implies your code for a message handler is blocking the socket gateway code. Make sure that your handlers don't block if possible
i've removed my InteractionHandler and i will let the bot run for a day again and see if it still behaves the same when the deadlock occurs.