orleans
orleans copied to clipboard
Cluster is trying to communicate silo using old IP address
Environment:
- net8.0
- Hosted at Azure k8s
- Azure table storage clustering provider
- packages:
- Microsoft.Orleans.Server 8.2.0
- Microsoft.Orleans.Hosting.Kubernetes 8.2.0
- Microsoft.Orleans.Clustering.AzureStorage 8.2.0
Problem:
Last night 6 of 12 silos restarted for some reason (probably because of a transient network issue). After that for three pods we have two records in the membership table: one Dead with an old IP address and another one Alive with a new IP address. But during the next 6 hours after the restarts happened (before we made a full deployment), the cluster was trying to communicate with one pod using its old IP address.
Exceptions:
Exception while sending message: Orleans.Runtime.Messaging.ConnectionFailedException: Unable to connect to S10.142.35.18:11111:101390272, will retry after 867.2316ms
at Orleans.Runtime.Messaging.ConnectionManager.GetConnectionAsync(SiloAddress endpoint) in /_/src/Orleans.Core/Networking/ConnectionManager.cs:line 99
at Orleans.Runtime.Messaging.MessageCenter.<SendMessage>g__SendAsync|30_0(MessageCenter messageCenter, ValueTask`1 connectionTask, Message msg) in /_/src/Orleans.Runtime/Messaging/MessageCenter.cs:line 236
Exception while sending message: Orleans.Runtime.Messaging.ConnectionFailedException: Connection attempt to endpoint S10.142.35.18:11111:101390272 timed out after 00:00:05
at Orleans.Runtime.Messaging.ConnectionManager.ConnectAsync(SiloAddress address, ConnectionEntry entry) in /_/src/Orleans.Core/Networking/ConnectionManager.cs:line 219
at Orleans.Runtime.Messaging.ConnectionManager.GetConnectionAsync(SiloAddress endpoint) in /_/src/Orleans.Core/Networking/ConnectionManager.cs:line 106
at Orleans.Runtime.Messaging.ConnectionManager.GetConnectionAsync(SiloAddress endpoint) in /_/src/Orleans.Core/Networking/ConnectionManager.cs:line 106
Also during these hours there were a lot of Target silo is known to be dead errors.
Is it a known issue?