discord.js icon indicating copy to clipboard operation
discord.js copied to clipboard

Websocket Destroying Shard

Open CaptainPlantain opened this issue 2 years ago • 1 comments
trafficstars

Which package is this bug report for?

discord.js

Issue description

I have been struggling with some unexpected websocket problems.

My bot has recently been added to a server that has 1.3M members. When attempting to fetch members await interaction.guild.members.fetch() the shard hangs and eventually is destroyed. During this time, interactions fail for all commands. Logging the ws, I see the below debugging output. No other errors are thrown and I don't observe any network issues.

[WS => Shard 2] Destroying shard
 	Reason: Zombie connection
 	Code: 4200
 	Recover: Resume
 [WS => Shard 2] Connection status during destroy
	Needs closing: true
 	Ready state: 1
 Shard 2 reconnecting...
 Shard 2 reconnecting
 [WS => Shard 2] Connecting to wss://gateway-us-east1-d.discord.gg?v=10&encoding=json
 [WS => Shard 2] Waiting for event hello for 60000ms
 [WS => Shard 2] Preparing first heartbeat of the connection with a jitter of 0.3919126951911043; waiting 16166ms
 [WS => Shard 2] Resuming session
 	resume url: wss://gateway-us-east1-d.discord.gg
 	sequence: 47900
 	shard id: 2
 [WS => Shard 2] Invalid session; will attempt to resume: false
 [WS => Shard 2] Destroying shard
 	Reason: Invalid session
 	Code: 1000
 	Recover: Reconnect
 [WS => Shard 2] Connection status during destroy
 	Needs closing: true
 	Ready state: 1
 [WS => Shard 2] Cancelled initial heartbeat due to #destroy being called
 Shard 2 reconnecting...
 Shard 2 reconnecting
 [WS => Shard 2] Connecting to wss://gateway.discord.gg?v=10&encoding=json
[WS => Shard 2] Waiting for event hello for 60000ms
[WS => Shard 2] Preparing first heartbeat of the connection with a jitter of 0.4774250646629059; waiting 19693ms
 [WS => Shard 2] Waiting for identify throttle
 [WS => Shard 2] Identifying
 	shard id: 2
 	shard count: 6
 	intents: 33315
 	compression: none
 Waiting for event ready for 15000ms
 Shard received all its guilds. Marking as fully ready.
 Shard 2 ready
 [WS => Shard 2] First heartbeat sent, starting to beat every 41250ms
 [WS => Shard 2] Heartbeat acknowledged, latency of 25ms.

Also seemly randomly, a shard will be closed with code 1006 during reconnecting with Reason: none. For reconnects, I think the reason should be Reason: Told to reconnect by Discord. There are no other logged errors at the time of reconnecting. My network is stable. I'm not sure if this is related.

 [WS => Shard 3] Heartbeat acknowledged, latency of 21ms.
 Shard 3 reconnecting
 Shard 3 reconnecting...
 [WS => Shard 3] The gateway closed with an unexpected code 1006, attempting to resume.
 [WS => Shard 3] Destroying shard
 	Reason: none
 	Code: 1006
 	Recover: Resume
 [WS => Shard 3] Connection status during destroy
 	Needs closing: false
 	Ready state: 3
 [WS => Shard 3] Connecting to wss://gateway-us-east1-b.discord.gg?v=10&encoding=json
 [WS => Shard 3] Waiting for event hello for 60000ms
 [WS => Shard 3] Preparing first heartbeat of the connection with a jitter of 0.9189543049977831; waiting 37906ms
 [WS => Shard 3] Resuming session
 	resume url: wss://gateway-us-east1-b.discord.gg
 	sequence: 31654
 	shard id: 3
 [WS => Shard 3] Resumed and replayed 4 events

Code sample

No response

Versions

  • discord.js 14.11.0

Issue priority

High (immediate attention needed)

Which partials do you have configured?

No Partials

Which gateway intents are you subscribing to?

Guilds, GuildMembers, GuildWebhooks, GuildMessages, MessageContent

I have tested this issue on a development release

No response

CaptainPlantain avatar Jun 02 '23 19:06 CaptainPlantain

same

l1v0n1 avatar Jun 15 '23 19:06 l1v0n1

Any update on this?

scottbucher avatar Jul 11 '23 00:07 scottbucher

I just coded around it. Fetching all members in a server is discouraged in most cases and the feature I was using it for was targeted at more convenience than function

CaptainPlantain avatar Jul 11 '23 00:07 CaptainPlantain

This was discussed internally but I forgot to comment here - there's nothing we can really do here. This is a limitation of JS runtimes that we cannot work around. There's just so many packets coming in when you make that request that the event loop gets super busy. Even if we were to somehow efficiently offload a lot of the sync ops (JSON.parse) to worker threads, I feel like you'd still get a backlog of events coming in and the event loop would still be quite busy just from all the EventEmitters firing constantly.

As extreme and absurd of a solution, if you seriously need this working perfectly, you'd genuinely need a gateway written in a different language that can handle the load, and use @discordjs/core with a message broker in the middle.

All of the above mostly applies to not getting the interaction events in time (and similar issues). Some improvements could arguably be made in the zombie connection detection area, but I'm skeptical on the benefits/correctness of the proposed changes we've discussed.

didinele avatar Oct 12 '23 16:10 didinele