orleans
orleans copied to clipboard
Orleans Silos freezing and crashing
Context: Since we’ve migrated to the Orleans 7, we’ve experienced a few performance issues..
We are running orleans in kubernetes with kubernets hosting and clustering. But after the migration we have had issues with silos not responding. Not much exceptions just timing out. We have added callthreadreentrancy where needed but it still is not reliable. Healthprobing them and killing is helping but the system is down until the failing probe threshold is hit, so its not a good fix. We probably have some issues in our code but its hard to say where to start looking when the silos just die.
2024-06-04 12:41:10.476 | {"Message":"Exception publishing client routing table to silo \"S10.244.6.9:11111:76427299\"","MessageTemplate":"Exception publishing client routing table to silo {SiloAddress}","Exception":{"Type":"Orleans.Runtime.OrleansMessageRejectionException","Message":"Exception while sending message: Orleans.Runtime.Messaging.ConnectionFailedException: Unable to connect to S10.244.6.9:11111:76427299, will retry after 895.3856ms\n at Orleans.Runtime.Messaging.ConnectionManager.GetConnectionAsync(SiloAddress endpoint) in /_/src/Orleans.Core/Networking/ConnectionManager.cs:line 99\n at Orleans.Runtime.Messaging.MessageCenter.<SendMessage>g__SendAsync\|29_0(MessageCenter messageCenter, ValueTask`1 connectionTask, Message msg) in /_/src/Orleans.Runtime/Messaging/MessageCenter.cs:line 226","StackTrace":" at Orleans.Serialization.Invocation.ResponseCompletionSource.System.Threading.Tasks.Sources.IValueTaskSource.GetResult(Int16 token) in /_/src/Orleans.Serialization/Invocation/ResponseCompletionSource.cs:line 98\n at System.Threading.Tasks.ValueTask.ValueTaskSourceAsTask.<>c.<.cctor>b__4_0(Object state)\n--- End of stack trace from previous location ---\n at Orleans.Runtime.GrainDirectory.ClientDirectory.PublishUpdates() in /_/src/Orleans.Runtime/GrainDirectory/ClientDirectory.cs:line 499"},"SiloAddress":"S10.244.6.9:11111:76427299","ExceptionDetail":{"HResult":-2146233088,"Message":"Exception while sending message: Orleans.Runtime.Messaging.ConnectionFailedException: Unable to connect to S10.244.6.9:11111:76427299, will retry after 895.3856ms\n at Orleans.Runtime.Messaging.ConnectionManager.GetConnectionAsync(SiloAddress endpoint) in /_/src/Orleans.Core/Networking/ConnectionManager.cs:line 99\n at Orleans.Runtime.Messaging.MessageCenter.<SendMessage>g__SendAsync\|29_0(MessageCenter messageCenter, ValueTask`1 connectionTask, Message msg) in /_/src/Orleans.Runtime/Messaging/MessageCenter.cs:line 226","Source":"System.Private.CoreLib","TargetSite":"Void Throw()","Type":"Orleans.Runtime.OrleansMessageRejectionException"},"app":"Innbyggertjenester.Silo","ENV":"Production","APP_NAME":"innbyggertjenester-silo-e2e","POD_NAMESPACE":"e2e","POD_NAME":"innbyggertjenester-silo-e2e-58646859d8-5htvm"} |
| | 2024-06-04 12:41:10.385 | {"Message":"Indirect probe request #60 to silo \"S10.244.6.9:11111:76427299\" via silo \"S10.244.5.16:11111:76502162\" failed after 00:00:01.5819712 with a direct probe response time of 00:00:01.5604363. Failure message: \"Encountered exception \nExc level 0: Orleans.Runtime.OrleansMessageRejectionException: Exception while sending message: Orleans.Runtime.Messaging.ConnectionFailedException: Unable to connect to endpoint S10.244.6.9:11111:76427299. See InnerException\n ---> Orleans.Networking.Shared.SocketConnectionException: Unable to connect to 10.244.6.9:11111. Error: HostUnreachable\n at Orleans.Networking.Shared.SocketConnectionFactory.ConnectAsync(EndPoint endpoint, CancellationToken cancellationToken) in /_/src/Orleans.Core/Networking/Shared/SocketConnectionFactory.cs:line 65\n at Orleans.Runtime.Messaging.ConnectionFactory.ConnectAsync(SiloAddress address, CancellationToken cancellationToken) in /_/src/Orleans.Core/Networking/ConnectionFactory.cs:line 61\n at Orleans.Runtime.Messaging.ConnectionManager.ConnectAsync(SiloAddress address, ConnectionEntry entry) in /_/src/Orleans.Core/Networking/ConnectionManager.cs:line 193\n --- End of inner exception stack trace ---\n at Orleans.Runtime.Messaging.ConnectionManager.ConnectAsync(SiloAddress address, ConnectionEntry entry) in /_/src/Orleans.Core/Networking/ConnectionManager.cs:line 221\n at Orleans.Runtime.Messaging.ConnectionManager.GetConnectionAsync(SiloAddress endpoint) in /_/src/Orleans.Core/Networking/ConnectionManager.cs:line 106\n at Orleans.Runtime.Messaging.MessageCenter.<SendMessage>g__SendAsync\|29_0(MessageCenter messageCenter, ValueTask`1 connectionTask, Message msg) in /_/src/Orleans.Runtime/Messaging/MessageCenter.cs:line 226\n at Orleans.Serialization.Invocation.ResponseCompletionSource.System.Threading.Tasks.Sources.IValueTaskSource.GetResult(Int16 token) in /_/src/Orleans.Serialization/Invocation/ResponseCompletionSource.cs:line 98\n at System.Threading.Tasks.ValueTask.ValueTaskSourceAsTask.<>c.<.cctor>b__4_0(Object state)\n--- End of stack trace from previous location ---\n at Orleans.Internal.OrleansTaskExtentions.WithTimeout(Task taskToComplete, TimeSpan timeout, String exceptionMessage) in /_/src/Orleans.Core/Async/TaskExtensions.cs:line 87\n at Orleans.Runtime.MembershipService.MembershipSystemTarget.ProbeIndirectly(SiloAddress target, TimeSpan probeTimeout, Int32 probeNumber) in /_/src/Orleans.Runtime/MembershipService/MembershipSystemTarget.cs:line 86\". Intermediary health score: 0","MessageTemplate":"Indirect probe request #{Id} to silo {SiloAddress} via silo {IntermediarySiloAddress} failed after {RoundTripTime} with a direct probe response time of {ProbeResponseTime}. Failure message: {FailureMessage}. Intermediary health score: {IntermediaryHealthScore}","Id":60,"SiloAddress":"S10.244.6.9:11111:76427299","IntermediarySiloAddress":"S10.244.5.16:11111:76502162","RoundTripTime":"00:00:01.5819712","ProbeResponseTime":"00:00:01.5604363","FailureMessage":"Encountered exception \nExc level 0: Orleans.Runtime.OrleansMessageRejectionException: Exception while sending message: Orleans.Runtime.Messaging.ConnectionFailedException: Unable to connect to endpoint S10.244.6.9:11111:76427299. See InnerException\n ---> Orleans.Networking.Shared.SocketConnectionException: Unable to connect to 10.244.6.9:11111. Error: HostUnreachable\n at Orleans.Networking.Shared.SocketConnectionFactory.ConnectAsync(EndPoint endpoint, CancellationToken cancellationToken) in /_/src/Orleans.Core/Networking/Shared/SocketConnectionFactory.cs:line 65\n at Orleans.Runtime.Messaging.ConnectionFactory.ConnectAsync(SiloAddress address, CancellationToken cancellationToken) in /_/src/Orleans.Core/Networking/ConnectionFactory.cs:line 61\n at Orleans.Runtime.Messaging.ConnectionManager.ConnectAsync(SiloAddress address, ConnectionEntry entry) in /_/src/Orleans.Core/Networking/ConnectionManager.cs:line 193\n --- End of inner exception stack trace ---\n at Orleans.Runtime.Messaging.ConnectionManager.ConnectAsync(SiloAddress address, ConnectionEntry entry) in /_/src/Orleans.Core/Networking/ConnectionManager.cs:line 221\n at Orleans.Runtime.Messaging.ConnectionManager.GetConnectionAsync(SiloAddress endpoint) in /_/src/Orleans.Core/Networking/ConnectionManager.cs:line 106\n at Orleans.Runtime.Messaging.MessageCenter.<SendMessage>g__SendAsync\|29_0(MessageCenter messageCenter, ValueTask`1 connectionTask, Message msg) in /_/src/Orleans.Runtime/Messaging/MessageCenter.cs:line 226\n at Orleans.Serialization.Invocation.ResponseCompletionSource.System.Threading.Tasks.Sources.IValueTaskSource.GetResult(Int16 token) in /_/src/Orleans.Serialization/Invocation/ResponseCompletionSource.cs:line 98\n at System.Threading.Tasks.ValueTask.ValueTaskSourceAsTask.<>c.<.cctor>b__4_0(Object state)\n--- End of stack trace from previous location ---\n at Orleans.Internal.OrleansTaskExtentions.WithTimeout(Task taskToComplete, TimeSpan timeout, String exceptionMessage) in /_/src/Orleans.Core/Async/TaskExtensions.cs:line 87\n at Orleans.Runtime.MembershipService.MembershipSystemTarget.ProbeIndirectly(SiloAddress target, TimeSpan probeTimeout, Int32 probeNumber) in /_/src/Orleans.Runtime/MembershipService/MembershipSystemTarget.cs:line 86","IntermediaryHealthScore":0,"app":"Innbyggertjenester.Silo","ENV":"Production","APP_NAME":"innbyggertjenester-silo-e2e","POD_NAMESPACE":"e2e","POD_NAME":"innbyggertjenester-silo-e2e-58646859d8-96gfk"} |
| | 2024-06-04 12:41:10.372 | {"Message":"Connection id \"\"0HN44G8KKFI9R\"\", Request id \"\"0HN44G8KKFI9R:000000E9\"\": An unhandled exception was thrown by the application.","MessageTemplate":"Connection id \"{ConnectionId}\", Request id \"{TraceIdentifier}\": An unhandled exception was thrown by the application.","Exception":{"Type":"Orleans.Runtime.OrleansMessageRejectionException","Message":"Exception while sending message: Orleans.Runtime.Messaging.ConnectionFailedException: Unable to connect to endpoint S10.244.6.9:11111:76427299. See InnerException\n ---> Orleans.Networking.Shared.SocketConnectionException: Unable to connect to 10.244.6.9:11111. Error: HostUnreachable\n at Orleans.Networking.Shared.SocketConnectionFactory.ConnectAsync(EndPoint endpoint, CancellationToken cancellationToken) in /_/src/Orleans.Core/Networking/Shared/SocketConnectionFactory.cs:line 65\n at Orleans.Runtime.Messaging.ConnectionFactory.ConnectAsync(SiloAddress address, CancellationToken cancellationToken) in /_/src/Orleans.Core/Networking/ConnectionFactory.cs:line 61\n at Orleans.Runtime.Messaging.ConnectionManager.ConnectAsync(SiloAddress address, ConnectionEntry entry) in /_/src/Orleans.Core/Networking/ConnectionManager.cs:line 193\n --- End of inner exception stack trace ---\n at Orleans.Runtime.Messaging.ConnectionManager.ConnectAsync(SiloAddress address, ConnectionEntry entry) in /_/src/Orleans.Core/Networking/ConnectionManager.cs:line 221\n at Orleans.Runtime.Messaging.ConnectionManager.GetConnectionAsync(SiloAddress endpoint) in /_/src/Orleans.Core/Networking/ConnectionManager.cs:line 106\n at Orleans.Runtime.Messaging.MessageCenter.<SendMessage>g__SendAsync\|29_0(MessageCenter messageCenter, ValueTask`1 connectionTask, Message msg) in /_/src/Orleans.Runtime/Messaging/MessageCenter.cs:line 226","StackTrace":" at Orleans.Serialization.Invocation.ResponseCompletionSource`1.GetResult(Int16 token) in /_/src/Orleans.Serialization/Invocation/ResponseCompletionSource.cs:line 230\n at System.Threading.Tasks.ValueTask`1.ValueTaskSourceAsTask.<>c.<.cctor>b__4_0(Object state)\n--- End of stack trace from previous location ---\n at OrleansDashboard.DashboardClient.ClusterStats()\n at OrleansDashboard.DashboardMiddleware.Invoke(HttpContext context)\n at Microsoft.AspNetCore.Builder.Extensions.MapMiddleware.InvokeCore(HttpContext context, PathString matchedPath, PathString remainingPath)\n at Microsoft.AspNetCore.Server.Kestrel.Core.Internal.Http.HttpProtocol.ProcessRequests[TContext](IHttpApplication`1 application)"},"ConnectionId":"0HN44G8KKFI9R","TraceIdentifier":"0HN44G8KKFI9R:000000E9","EventId":{"Id":13,"Name":"ApplicationError"},"RequestId":"0HN44G8KKFI9R:000000E9","RequestPath":"/dashboard/ClusterStats","ExceptionDetail":{"HResult":-2146233088,"Message":"Exception while sending message: Orleans.Runtime.Messaging.ConnectionFailedException: Unable to connect to endpoint S10.244.6.9:11111:76427299. See InnerException\n ---> Orleans.Networking.Shared.SocketConnectionException: Unable to connect to 10.244.6.9:11111. Error: HostUnreachable\n at Orleans.Networking.Shared.SocketConnectionFactory.ConnectAsync(EndPoint endpoint, CancellationToken cancellationToken) in /_/src/Orleans.Core/Networking/Shared/SocketConnectionFactory.cs:line 65\n at Orleans.Runtime.Messaging.ConnectionFactory.ConnectAsync(SiloAddress address, CancellationToken cancellationToken) in /_/src/Orleans.Core/Networking/ConnectionFactory.cs:line 61\n at Orleans.Runtime.Messaging.ConnectionManager.ConnectAsync(SiloAddress address, ConnectionEntry entry) in /_/src/Orleans.Core/Networking/ConnectionManager.cs:line 193\n --- End of inner exception stack trace ---\n at Orleans.Runtime.Messaging.ConnectionManager.ConnectAsync(SiloAddress address, ConnectionEntry entry) in /_/src/Orleans.Core/Networking/ConnectionManager.cs:line 221\n at Orleans.Runtime.Messaging.ConnectionManager.GetConnectionAsync(SiloAddress endpoint) in /_/src/Orleans.Core/Networking/ConnectionManager.cs:line 106\n at Orleans.Runtime.Messaging.MessageCenter.<SendMessage>g__SendAsync\|29_0(MessageCenter messageCenter, ValueTask`1 connectionTask, Message msg) in /_/src/Orleans.Runtime/Messaging/MessageCenter.cs:line 226","Source":"System.Private.CoreLib","TargetSite":"Void Throw()","Type":"Orleans.Runtime.OrleansMessageRejectionException"},"app":"Innbyggertjenester.Silo","ENV":"Production","APP_NAME":"innbyggertjenester-silo-e2e","POD_NAMESPACE":"e2e","POD_NAME":"innbyggertjenester-silo-e2e-58646859d8-96gfk"} |
| | 2024-06-04 12:41:10.371 | {"Message":"Connection attempt to endpoint \"S10.244.6.9:11111:76427299\" failed","MessageTemplate":"Connection attempt to endpoint {EndPoint} failed","Exception":{"Type":"Orleans.Networking.Shared.SocketConnectionException","Message":"Unable to connect to 10.244.6.9:11111. Error: HostUnreachable","StackTrace":" at Orleans.Networking.Shared.SocketConnectionFactory.ConnectAsync(EndPoint endpoint, CancellationToken cancellationToken) in /_/src/Orleans.Core/Networking/Shared/SocketConnectionFactory.cs:line 65\n at Orleans.Runtime.Messaging.ConnectionFactory.ConnectAsync(SiloAddress address, CancellationToken cancellationToken) in /_/src/Orleans.Core/Networking/ConnectionFactory.cs:line 61\n at Orleans.Runtime.Messaging.ConnectionManager.ConnectAsync(SiloAddress address, ConnectionEntry entry) in /_/src/Orleans.Core/Networking/ConnectionManager.cs:line 193"},"EndPoint":"S10.244.6.9:11111:76427299","ExceptionDetail":{"HResult":-2146233088,"Message":"Unable to connect to 10.244.6.9:11111. Error: HostUnreachable","Source":"Orleans.Core","TargetSite":"Void MoveNext()","Type":"Orleans.Networking.Shared.SocketConnectionException"},"app":"Innbyggertjenester.Silo","ENV":"Production","APP_NAME":"innbyggertjenester-silo-e2e","POD_NAMESPACE":"e2e","POD_NAME":"innbyggertjenester-silo-e2e-58646859d8-5htvm"} |
I dont have many proposals for a next step here, as we have tried to scour the solution for reentrancy issues.