rabbitmq-dotnet-client icon indicating copy to clipboard operation
rabbitmq-dotnet-client copied to clipboard

connection topology recover can miss restore consumers

Open pierresetteskog opened this issue 4 years ago • 3 comments

sometimes our network is unstable and our services just stoppes listening on messages. Its seems like the client gets an initial connection but very quickly losening it again. It's seems like in 6.x and 7.x has try catch by purpose to just continue if any errors while register consumers etc.

Why not just add throw and let it retry after a 5sec if it cant recover full topology?

private void HandleTopologyRecoveryException(TopologyRecoveryException e) { ESLog.Error("Topology recovery exception", e); // throw e; //if this is added it works as expected }

Should i create PR?

Log error: payload: {"Type":"RabbitMQ.Client.Exceptions.TopologyRecoveryException","Message":"Caught an exception while recovering consumer amq.ctag-MsJoUkH2fqy6Ttm4ks528Q on queue Summoning.WheelChange.BookingExpiration: Already closed: The AMQP operation was interrupted: AMQP close-reason, initiated by Peer, code=404, text='NOT_FOUND - queue 'Summoning.WheelChange.BookingExpiration' in vhost '/' process is stopped by supervisor', classId=50, methodId=10","StackTrace":"","InnerException":"RabbitMQ.Client.Exceptions.AlreadyClosedException: Already closed: The AMQP operation was interrupted: AMQP close-reason, initiated by Peer, code=404, text='NOT_FOUND - queue 'Summoning.WheelChange.BookingExpiration' in vhost '/' process is stopped by supervisor', classId=50, methodId=10\n at RabbitMQ.Client.Impl.SessionBase.Transmit(OutgoingCommand& cmd)\n at RabbitMQ.Client.Impl.ModelBase.ModelSend(MethodBase method, ContentHeaderBase

pierresetteskog avatar May 18 '21 09:05 pierresetteskog

Because topology recovery can fail for all kinds of reasons. We cannot assume that every error is related to connection state. I'd only consider a PR that retries on exceptions that we know are related to connectivity.

Specifically

NOT_FOUND - queue 'Summoning.WheelChange.BookingExpiration' in vhost '/' process is stopped by supervisor'

suggests that the node hosting the leader of that queue has been shutting down.

michaelklishin avatar May 18 '21 09:05 michaelklishin

thanks I will give it a try in our environment first, every month the system team makes something with our environment private void HandleTopologyRecoveryException(TopologyRecoveryException e) { ESLog.Error("Topology recovery exception", e); if (e.InnerException is AlreadyClosedException) { throw e; } }

pierresetteskog avatar May 18 '21 09:05 pierresetteskog

will there be a nuget release on my pr to 6.x branch or when will next master release be ? :)

pierresetteskog avatar Jun 01 '21 14:06 pierresetteskog

Closing because this appears to have been fixed.

lukebakken avatar Nov 18 '23 00:11 lukebakken