Discord.Net icon indicating copy to clipboard operation
Discord.Net copied to clipboard

[Bug]: Operation timeout with creation of 6+ shards

Open nikita-petko opened this issue 1 year ago • 15 comments

Check The Docs

  • [X] I double checked the docs and couldn't find any useful information.

Verify Issue Source

  • [X] I verified the issue was caused by Discord.Net.

Check your intents

  • [X] I double checked that I have the required intents.

Description

Note: Please view edit history to see the original purpose of this issue.

If you create a sharded client with 6 or more shards, at around the 6th shard, all shards are prevented from running. The bot will continue to receive dispatches but will be unable to actually process the events (debug logging notes the dispatches but handlers are not being invoked).

This may relate to one of my old issues: #2126

Version

3.11.0

Working Version

No response

Logs

[2023-08-01T03:43:27.3327Z][0029][][KVEX-WIN-234][bot][ERROR] DiscordInternal-EXCEPTION-Shard #6:
Error Type: System.TimeoutException
Error Detail: The operation has timed out.
Inner Exception:
Exception Stack Trace:
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at Discord.ConnectionManager.<WaitAsync>d__34.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at Discord.WebSocket.DiscordSocketClient.<OnConnectingAsync>d__118.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at Discord.ConnectionManager.<ConnectAsync>d__31.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
   at Discord.ConnectionManager.<>c__DisplayClass29_0.<<StartAsync>b__0>d.MoveNext()
Exception Source: mscorlib
Exception TargetSite: Void Throw()
Exception Data: System.Collections.ListDictionaryInternal

Sample

Instantiation

var client = new DiscordShardedClient(
	new DiscordSocketConfig
	{
		GatewayIntents =
			GatewayIntents.GuildMessages
			| GatewayIntents.DirectMessages
			| GatewayIntents.Guilds
			| GatewayIntents.MessageContent,
		LogGatewayIntentWarnings = false,
		TotalShards = 10,
		LogLevel = LogSeverity.Debug,
	}
)

Packages

N/A

nikita-petko avatar Jul 29 '23 03:07 nikita-petko

Assuming you are in a Linux environment, could you attempt to reproduce this in a Windows environment and let me know your results. @nikita-petko

DeclanFrampton avatar Jul 31 '23 11:07 DeclanFrampton

@DeclanFrampton these were all performed on Windows machines.

nikita-petko avatar Jul 31 '23 12:07 nikita-petko

@DeclanFrampton these were all performed on Windows machines.

Thats that idea out the window then, what are the system specs(probs not the issue, always good to have more info though)

When you did a test with a different token for a singular server, did you use the same project? Also have you made any changes to the bot around the same time your the began?

DeclanFrampton avatar Jul 31 '23 12:07 DeclanFrampton

24 cores, 32GiB of physical memory. Windows Server 2019 Datacenter. 10GbE

  1. Yes
  2. The version was running fine and then stopped working completely, I shrugged it off to maybe I needed to update Discord.Net but that didn't fix it.

nikita-petko avatar Jul 31 '23 13:07 nikita-petko

24 cores, 32GiB of physical memory. Windows Server 2019 Datacenter. 10GbE

  1. Yes
  2. The version was running fine and then stopped working completely, I shrugged it off to maybe I needed to update Discord.Net but that didn't fix it.

Okay, since I can't debug this myself as I don't have 15k guilds to reproduce this, could you setup a new solution and use the same token. Use sharding/non shard and see if you still get the same issues. The bot doesn't need to have any features, just a fresh standalone build to test with.

If you still continue to get the issue ill bring up your issue in the discord to see if we can get the priority raised on this issue.

DeclanFrampton avatar Jul 31 '23 15:07 DeclanFrampton

@DeclanFrampton this is now the error that is encountered depsite sharding being enabled.

04:10:02 Shard #0    System.Exception: WebSocket connection was closed ---> Discord.Net.WebSocketClosedException: The server sent close 4011: "Sharding required."
   at Discord.Net.WebSockets.DefaultWebSocketClient.<RunAsync>d__34.MoveNext()
   --- End of inner exception stack trace ---
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at Discord.ConnectionManager.<WaitAsync>d__34.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at Discord.WebSocket.DiscordSocketClient.<OnConnectingAsync>d__118.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at Discord.ConnectionManager.<ConnectAsync>d__31.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
   at Discord.ConnectionManager.<>c__DisplayClass29_0.<<StartAsync>b__0>d.MoveNext()
04:10:02 Shard #0    Disconnecting

nikita-petko avatar Aug 01 '23 03:08 nikita-petko

I have discovered the final issue, the reason it failed to connect is due to not enough shards, and I have set it up to automatically fetch the shards now.

Thank you for your help.

nikita-petko avatar Aug 01 '23 03:08 nikita-petko

Reopening as a new error has been encounted, after this error it will continously throw this error. It may be occuring in the GuildDownloader, but it also happens on my other test. After the 6th or 7th shard it will always throw this error.

[2023-08-01T03:43:27.3327Z][0029][][KVEX-WIN-234][bot][ERROR] DiscordInternal-EXCEPTION-Shard #6:
Error Type: System.TimeoutException
Error Detail: The operation has timed out.
Inner Exception:
Exception Stack Trace:
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at Discord.ConnectionManager.<WaitAsync>d__34.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at Discord.WebSocket.DiscordSocketClient.<OnConnectingAsync>d__118.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at Discord.ConnectionManager.<ConnectAsync>d__31.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
   at Discord.ConnectionManager.<>c__DisplayClass29_0.<<StartAsync>b__0>d.MoveNext()
Exception Source: mscorlib
Exception TargetSite: Void Throw()
Exception Data: System.Collections.ListDictionaryInternal

nikita-petko avatar Aug 01 '23 03:08 nikita-petko

I can confirm that this issue is persistent when shards are greater than or equal to 6. Once I get some free time later on today I will take a deeper dive into this unless someone else gets there first

12:11:22 Shard #6    System.TimeoutException: The operation has timed out.
   at Discord.ConnectionManager.WaitAsync() in C:\Users\development\Source\Repos\Discord.Net\src\Discord.Net.WebSocket\ConnectionManager.cs:line 186
   at Discord.WebSocket.DiscordSocketClient.OnConnectingAsync() in C:\Users\development\Source\Repos\Discord.Net\src\Discord.Net.WebSocket\DiscordSocketClient.cs:line 324
   at Discord.ConnectionManager.ConnectAsync(CancellationTokenSource reconnectCancelToken) in C:\Users\development\Source\Repos\Discord.Net\src\Discord.Net.WebSocket\ConnectionManager.cs:line 153
   at Discord.ConnectionManager.<>c__DisplayClass29_0.<<StartAsync>b__0>d.MoveNext() in C:\Users\development\Source\Repos\Discord.Net\src\Discord.Net.WebSocket\ConnectionManager.cs:line 77

DeclanFrampton avatar Aug 01 '23 11:08 DeclanFrampton

Conducted some additional testing, it seems the shard that gets timed out usually reconnects directly after. Always get the exception with 7 shards and over. Can you confirm this is the case for you?

14:08:28 Shard #0    System.TimeoutException: The operation has timed out.
   at Discord.ConnectionManager.WaitAsync() in C:\Users\development\Source\Repos\Discord.Net\src\Discord.Net.WebSocket\ConnectionManager.cs:line 187
   at Discord.WebSocket.DiscordSocketClient.OnConnectingAsync() in C:\Users\development\Source\Repos\Discord.Net\src\Discord.Net.WebSocket\DiscordSocketClient.cs:line 324
   at Discord.ConnectionManager.ConnectAsync(CancellationTokenSource reconnectCancelToken) in C:\Users\development\Source\Repos\Discord.Net\src\Discord.Net.WebSocket\ConnectionManager.cs:line 154
   at Discord.ConnectionManager.<>c__DisplayClass29_0.<<StartAsync>b__0>d.MoveNext() in C:\Users\development\Source\Repos\Discord.Net\src\Discord.Net.WebSocket\ConnectionManager.cs:line 78
14:08:28 Shard #0    Disconnecting
14:08:34 Shard #0    Disconnected
14:08:36 Shard #0    Connecting
14:08:36 Shard #0    Resumed previous session
14:08:36 Shard #0    Connected

DeclanFrampton avatar Aug 01 '23 13:08 DeclanFrampton

Same but with 6 shards and over:

nikita-petko avatar Aug 01 '23 13:08 nikita-petko

Same but with 6 shards and over:

Least we can reproduce now, does the shard reconnect, if so does it continue to cause issues as time goes by or does it seem stable?

DeclanFrampton avatar Aug 02 '23 01:08 DeclanFrampton

The shard will just continue to timeout, but will try to reconnect. But it causes all the other shards to fail

nikita-petko avatar Aug 02 '23 05:08 nikita-petko

The shard will just continue to timeout, but will try to reconnect. But it causes all the other shards to fail

Ok, I've asked a maintainer to see if he could take a look also. Meanwhile could you provide a full stacktrace yourself of the exception from debug. Thanks.

DeclanFrampton avatar Aug 02 '23 11:08 DeclanFrampton

Same but with 6 shards and over:

@DeclanFrampton this is the full exception on a debug build.

nikita-petko avatar Aug 02 '23 20:08 nikita-petko