discord.js
discord.js copied to clipboard
Websocket Destroying Shard
Which package is this bug report for?
discord.js
Issue description
I have been struggling with some unexpected websocket problems.
My bot has recently been added to a server that has 1.3M members. When attempting to fetch members await interaction.guild.members.fetch() the shard hangs and eventually is destroyed. During this time, interactions fail for all commands. Logging the ws, I see the below debugging output. No other errors are thrown and I don't observe any network issues.
[WS => Shard 2] Destroying shard
Reason: Zombie connection
Code: 4200
Recover: Resume
[WS => Shard 2] Connection status during destroy
Needs closing: true
Ready state: 1
Shard 2 reconnecting...
Shard 2 reconnecting
[WS => Shard 2] Connecting to wss://gateway-us-east1-d.discord.gg?v=10&encoding=json
[WS => Shard 2] Waiting for event hello for 60000ms
[WS => Shard 2] Preparing first heartbeat of the connection with a jitter of 0.3919126951911043; waiting 16166ms
[WS => Shard 2] Resuming session
resume url: wss://gateway-us-east1-d.discord.gg
sequence: 47900
shard id: 2
[WS => Shard 2] Invalid session; will attempt to resume: false
[WS => Shard 2] Destroying shard
Reason: Invalid session
Code: 1000
Recover: Reconnect
[WS => Shard 2] Connection status during destroy
Needs closing: true
Ready state: 1
[WS => Shard 2] Cancelled initial heartbeat due to #destroy being called
Shard 2 reconnecting...
Shard 2 reconnecting
[WS => Shard 2] Connecting to wss://gateway.discord.gg?v=10&encoding=json
[WS => Shard 2] Waiting for event hello for 60000ms
[WS => Shard 2] Preparing first heartbeat of the connection with a jitter of 0.4774250646629059; waiting 19693ms
[WS => Shard 2] Waiting for identify throttle
[WS => Shard 2] Identifying
shard id: 2
shard count: 6
intents: 33315
compression: none
Waiting for event ready for 15000ms
Shard received all its guilds. Marking as fully ready.
Shard 2 ready
[WS => Shard 2] First heartbeat sent, starting to beat every 41250ms
[WS => Shard 2] Heartbeat acknowledged, latency of 25ms.
Also seemly randomly, a shard will be closed with code 1006 during reconnecting with Reason: none. For reconnects, I think the reason should be Reason: Told to reconnect by Discord. There are no other logged errors at the time of reconnecting. My network is stable. I'm not sure if this is related.
[WS => Shard 3] Heartbeat acknowledged, latency of 21ms.
Shard 3 reconnecting
Shard 3 reconnecting...
[WS => Shard 3] The gateway closed with an unexpected code 1006, attempting to resume.
[WS => Shard 3] Destroying shard
Reason: none
Code: 1006
Recover: Resume
[WS => Shard 3] Connection status during destroy
Needs closing: false
Ready state: 3
[WS => Shard 3] Connecting to wss://gateway-us-east1-b.discord.gg?v=10&encoding=json
[WS => Shard 3] Waiting for event hello for 60000ms
[WS => Shard 3] Preparing first heartbeat of the connection with a jitter of 0.9189543049977831; waiting 37906ms
[WS => Shard 3] Resuming session
resume url: wss://gateway-us-east1-b.discord.gg
sequence: 31654
shard id: 3
[WS => Shard 3] Resumed and replayed 4 events
Code sample
No response
Versions
- discord.js 14.11.0
Issue priority
High (immediate attention needed)
Which partials do you have configured?
No Partials
Which gateway intents are you subscribing to?
Guilds, GuildMembers, GuildWebhooks, GuildMessages, MessageContent
I have tested this issue on a development release
No response
same
Any update on this?
I just coded around it. Fetching all members in a server is discouraged in most cases and the feature I was using it for was targeted at more convenience than function
This was discussed internally but I forgot to comment here - there's nothing we can really do here. This is a limitation of JS runtimes that we cannot work around. There's just so many packets coming in when you make that request that the event loop gets super busy. Even if we were to somehow efficiently offload a lot of the sync ops (JSON.parse) to worker threads, I feel like you'd still get a backlog of events coming in and the event loop would still be quite busy just from all the EventEmitters firing constantly.
As extreme and absurd of a solution, if you seriously need this working perfectly, you'd genuinely need a gateway written in a different language that can handle the load, and use @discordjs/core with a message broker in the middle.
All of the above mostly applies to not getting the interaction events in time (and similar issues). Some improvements could arguably be made in the zombie connection detection area, but I'm skeptical on the benefits/correctness of the proposed changes we've discussed.