dotnet-etcd icon indicating copy to clipboard operation
dotnet-etcd copied to clipboard

Watch stops listening to changes after server restart

Open Sirozha1337 opened this issue 1 year ago • 3 comments

Describe the bug

Restarting a server with running etcd breaks watch in services running on other servers.

To Reproduce

  1. Run Server 1 with etcd
  2. Run Server 2 with a service using this library. Example code from service:
_client = new EtcdClient(_options.ConnectionString, _options.Port);
try {
     _client.WatchRangeAsync(prefix, callback, EnsureAuthentication(), cancellationToken: cancellationToken)
}
catch (Exception ex){
    _logger.LogError(ex, "Error in Watch!");
}
  1. Stop Server 1
  2. Check that there's no exception in Server 2
  3. Start Server 1
  4. Make changes to keys in etcd
  5. Check that there're no exceptions in Server 2 and "callback" is not called

Expected behavior WatchRange should throw an error, just like it does when etcd server is restarted.

Additional context It seems the problem is the difference between service shutdown and server shutdown:

  • When you shutdown a service and try to connect to it with telnet, you get a connection refused.
  • When you do the same with a server, you get a timeout.

Sirozha1337 avatar Jul 28 '23 07:07 Sirozha1337

Can you confirm the version of the library being used ? We do have retry logic in place for connection failures (StatusCode.Unavailable)

shubhamranjan avatar Oct 19 '23 08:10 shubhamranjan

Can you confirm the version of the library being used ? We do have retry logic in place for connection failures (StatusCode.Unavailable)

The latest one - 6.2.0-beta

I've managed to fix this problem by providing SocketsHttpHandler configured with timeouts in configureChannelOptions:

new EtcdClient(_options.ConnectionString, _options.Port, configureChannelOptions:
				channelOptions =>
				{
					var handler = new SocketsHttpHandler();
					handler.KeepAlivePingDelay = TimeSpan.FromSeconds(30);
					handler.KeepAlivePingTimeout = TimeSpan.FromSeconds(30);
					handler.KeepAlivePingPolicy = TimeSpan.FromSeconds(30);
					
					channelOptions.HttpHandler = handler;
					channelOptions.ThrowOperationCanceledOnCancellation = true;
				})

Default handler doesn't ping the connection, so it doesn't know that etcd is down. With this configuration it will send ping packets every 30 seconds and if they timeout it will throw an exception.

Sirozha1337 avatar Oct 22 '23 14:10 Sirozha1337

Thank you. That is a good recommendation, will see if some ideal defaults fits in.

shubhamranjan avatar Oct 22 '23 15:10 shubhamranjan