Creating a subscription with Event Store offline at first hangs for a while and causes high CPU usage
Describe the bug The docs don't describe resubscribing after dropped subscriptions in detail, so I came up with my own solution (for simplicity's sake I'm glossing over checkpoints here):
static async Task Subscribe(EventStoreClient c, string stream)
{
while (true)
{
try
{
await c.SubscribeToStreamAsync(stream, (sub, e, ct) =>
{
Console.WriteLine(e.Event.EventId);
return Task.CompletedTask;
},
subscriptionDropped: async (sub, reason, ex) =>
{
if (reason != SubscriptionDroppedReason.Disposed)
await Subscribe(c, stream);
}
);
return;
}
catch
{
await Task.Delay(5000);
}
}
}
static async Task Main(string[] args)
{
var c = new EventStoreClient(EventStoreClientSettings.Create("esdb://localhost:2113?tls=false"));
await Subscribe(c, "newstream");
Thread.Sleep(-1);
}
This works very well when I start the application and restart Event Store--resubscriptions and replays are almost instant. The problem appears when I start the application while Event Store is offline. When I bring it online, the call to SubscribeToStreamAsync hangs for around 2 minutes, maxing out a CPU core and, according to VS, allocating 1.5GB over that period (it must get collected quickly because memory used remains stable), but it does succeed in the end. Resubscribing after subsequent Event Store restarts is immediate again.
To Reproduce Steps to reproduce the behavior:
- Append some events to
newstream - Shut down Event Store
- Run the code above and wait for a couple of seconds
- Start Event Store
- Wait until event IDs are printed while observing the process's CPU usage in Task Manager.
Expected behavior
SubscribeToStreamAsync yields quickly when the application is started with Event Store offline and then launched.
Actual behavior
SubscribeToStreamAsync takes a couple of minutes and consumes resources when the application is started with Event Store offline and then launched.
EventStore details
- EventStore server version: eventstore/eventstore:21.2.0-buster-slim image
- Operating system: Win 10, Ubuntu 20.04.1 via WSL2
- EventStore client version (if applicable): EventStore.Client.Grpc.Streams 21.2.0
- .NET runtime 5.0.7
I observe the same behavior, albeit with the eventstore/eventstore:21.10.1-buster-slim image.
Hi @kerams and @james-allan-lloyd thanks for the detailed report, I can reproduce this on EventStoreDB 22.6.0, and the master branch for the gRPC client.
However, I'm only seeing this issue when running EventStoreDB in insecure mode in docker. I don't see it when running EventStoreDB locally, or when I run EventStoreDB in a secure docker container.
Can you confirm this as well?
In the logs, I always see a deadline exceeded error before the client takes a long time to resubscribe, which leads me to believe that there's a timeout that's not handled correctly internally in the client.
It was indeed insecure in a container. Unfortunately, I do not have the opportunity to try other scenarios right now.
I was analyzing this issue too. I did some profiling and I think the problem seems to be in this method, according to the profiler: https://github.com/EventStore/EventStore-Client-Dotnet/blob/8ec180533c1f1c5e868744e9f006b20f135bc450/src/EventStore.Client/SharingProvider.cs#L85