rabbitmq-dotnet-client icon indicating copy to clipboard operation
rabbitmq-dotnet-client copied to clipboard

WaitForConfirmsOrDie won't return if DeliveryTag on the client is diverged from the server

Open stukselbax opened this issue 3 years ago • 3 comments

We have a situation in which the IModel is no longer usable for publishing and waiting for publish confirms with WaitForConfirmsOrDie() method. We have injected CorrelationId to the IBasicProperties object when publish a message for distributed tracing. Sometime our correlationId was too big, so the error occurred when the attempt to publish an event was made. We have stuck on the method WaitForConfirmsOrDie(), it blocks the thread and didn't return. It turns out that in such situation the delivery tag tracked on the client has been diverged from delivery tag tracked by the server. We have fixed the issue currently by moving correlationId to headers, nevertheless, I think it is a bug on the RabbitMQ.Client side.

When client generates a new delivery tag it should revert it back if it failed to send the message.

Given the example, we will never see "I'm done..." in the console:

Client delivery tag 1
Acked 1
Client delivery tag 2
Error when trying to publish with long string: Value exceeds the maximum allowed length of 255 bytes. (Parameter 'val')
Actual value was oooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo.
Client delivery tag 3
Acked 2

So, here the reproducible example (tried with 5.1.2 and 6.2.1 version - behavior is the same):

        static void Main(string[] args)
        {
            var connectionFactory = new ConnectionFactory(); // Configure here your RabbitMQ
            var connection = connectionFactory.CreateConnection();
            var channel = connection.CreateModel();
            channel.ExchangeDeclare("sample", "fanout", autoDelete: true);
            channel.BasicAcks += (s, e) => Console.WriteLine("Acked {0}", e.DeliveryTag);
            channel.ConfirmSelect();

            var properties = channel.CreateBasicProperties();
            Console.WriteLine("Client delivery tag {0}", channel.NextPublishSeqNo);
            channel.BasicPublish("sample", string.Empty, properties, new byte[1] { 42 });
            channel.WaitForConfirmsOrDie();

            properties = channel.CreateBasicProperties();
            try
            {
                properties.CorrelationId = new string('o', 256);
                Console.WriteLine("Client delivery tag {0}", channel.NextPublishSeqNo);
                channel.BasicPublish("sample", string.Empty, properties, new byte[1] { 42 });
                channel.WaitForConfirmsOrDie();
            }
            catch (Exception e)
            {
                Console.WriteLine("Error when trying to publish with long string: {0}", e.Message);
            }

            properties = channel.CreateBasicProperties();
            Console.WriteLine("Client delivery tag {0}", channel.NextPublishSeqNo);
            channel.BasicPublish("sample", string.Empty, properties, new byte[1] { 42 });
            channel.WaitForConfirmsOrDie();
            Console.WriteLine("I'm done...");
        }

stukselbax avatar May 06 '21 14:05 stukselbax

WaitForConfirmsOrDie is expected to block until a timeout hits or all outstanding messages are confirmed.

My best guess here is that the server acknowledges with multiple set to true and the client does not handle that correctly but this is something that has to be confirmed. Since you have a way to reproduce and presumably trace what's going on in the client, please do that. Note that using a debugger can have produce serious side effects on this code path, so tracing/debug is what should be used if possible.

michaelklishin avatar May 06 '21 15:05 michaelklishin

My best guess here is that the server acknowledges with multiple set to true

No, server sends basic.ack and multiple = false, because it didn't ever seen the second message.

I've added it to output , and it looks like this:

channel.BasicAcks += (s, e) => Console.WriteLine("Acked {0}, multiple: {1}", e.DeliveryTag, e.Multiple);
Client delivery tag 1
Acked 1, multiple: False
Client delivery tag 2
Error when trying to publish with long string: Value exceeds the maximum allowed length of 255 bytes. (Parameter 'val')
Actual value was oooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo.
Client delivery tag 3
Acked 2, multiple: False

stukselbax avatar May 06 '21 15:05 stukselbax

WaitForConfirmsOrDie is expected to block until a timeout hits or all outstanding messages are confirmed.

Message 1 was sent. Message 2 was tried to send, but error ocured before it hits the socket (when building actual amqp payload). Message 3 was sent, it was acked, but WaitForConfirmsOrDie didn't think that it was acked, it thinks that Message 2 was acked, because server send delivery tag 2.

IMHO this is a bug in a library. Possible fix is to revert client side delivery tag, or do generate it as late as possible before sending to socket.

stukselbax avatar May 06 '21 15:05 stukselbax

@stukselbax it has been a long time, but thank you for this bug report. The fix will ship in version 6.9.0 of this library.

lukebakken avatar Dec 26 '23 14:12 lukebakken

6.x version - https://github.com/rabbitmq/rabbitmq-dotnet-client/pull/1454

lukebakken avatar Dec 27 '23 15:12 lukebakken