Rebus.RabbitMq icon indicating copy to clipboard operation
Rebus.RabbitMq copied to clipboard

ChannelClosedException when receiving next message due to DNS error

Open Hugzy opened this issue 2 years ago • 1 comments

I've started to encounter an issue where Rebus fails to receive the next message from a queue because the channel has been closed.

My setup is such that I have a webserver with RabbitMQ running on it as well, and a batch processing software that runs on a separate server that consumes messages from RabbitMQ and processes the requests accordingly. But, sometimes it will completely stop dequeuing messages and a restart is needed in order to get the service to receive messages again.

The Errormessage:

An error occurred when attempting to receive the next message: Rebus.Exceptions.RebusApplicationException: Unexpected exception 
thrown while trying to dequeue a message from rabbitmq, queue address: DigiBatch ---> 
System.Threading.Channels.ChannelClosedException: The channel has been closed. at 
Rebus.RabbitMq.RabbitMqTransport.Receive(ITransactionContext context, CancellationToken cancellationToken) --- End of inner 
exception stack trace --- at Rebus.RabbitMq.RabbitMqTransport.Receive(ITransactionContext context, CancellationToken 
cancellationToken) at Rebus.Workers.ThreadPoolBased.ThreadPoolWorker.ReceiveTransportMessage(CancellationToken token, 
ITransactionContext context)

I suspect it has something to do with the fact that we are seeing a DNS timeout in the windows system logs just before this exception happens in our own logs. (The servers run in azure and communicate through an azure DNS) image

I've tracked the exception to this particular line of code in the rebus codebase https://github.com/rebus-org/Rebus.RabbitMq/blob/44363284c11b97f63b89c4f2d928db9593275008/Rebus.RabbitMq/RabbitMq/RabbitMqTransport.cs#L523 If the DNS times out but comes back on after a while, shouldn't rebus be able to reestablish the connection and continue to consume messages, or is there something that needs tweaking in this case?

Hugzy avatar Sep 30 '22 08:09 Hugzy

If the DNS times out but comes back on after a while, shouldn't rebus be able to reestablish the connection and continue to consume messages, or is there something that needs tweaking in this case?

I don't know, actually. Rebus doesn't really do anything with the RabbitMQ connection strings besides passing them to the RabbitMQ driver's connection factory, setting AutomaticRecoveryEnabled=true. If that's not enough, then I don't know what else Rebus could do to "survive" DNS timeouts... could it be a case of DNS-to-IP mappings being cached for a while or something like that?

mookid8000 avatar Oct 03 '22 21:10 mookid8000

Hi @Hugzy , I'll close this one for now assuming you fixed your issue. Let me know if that isn't the case

mookid8000 avatar Dec 21 '22 12:12 mookid8000