azure-sdk-for-net icon indicating copy to clipboard operation
azure-sdk-for-net copied to clipboard

[BUG] Message lock renewal not working for 10+ minutes processin durationg when using different `ServiceBusReceiver` instances for `CompleteMessageAsync` and `RenewMessageLockAsync`

Open ramonsmits opened this issue 1 year ago • 12 comments

Library name and version

Azure.Messaging.ServiceBus 7.8.0, 7.8.1 and 7.9.0

Describe the bug

When using transactions and invoking RenewMessageLockAsync on a different ServiceBusReceiver and processing durations are over 10 minutes then invoking CompleteMessageAsync CAN results in a MessageLockLost if that is the only message concurrently being processed.

A similar bug was reported earlier. This has been fixed in 7.8.0 when invoking invoking CompleteMessageAsync on the same receiver as CompleteMessageAsync.

  • https://github.com/Azure/azure-sdk-for-net/issues/28025

Expected behavior

The client not gets into a faulty state where lock renewal seems to succeed but when CompleteMessageAsync is invoked that it fails with MessageLockLost.

Actual behavior

The client can get into a faulty state where:

  • any message that will be ack'd via CompleteMessageAsync where processing is longer then the queue lock duration will fail
  • any message that result in a lock renewal via RenewMessageLockAsync will incorrectly succeed

Reproduction Steps

var lockDuration = TimeSpan.FromMinutes(5);
var renewalInterval = lockDuration - TimeSpan.FromSeconds(10);
var queueName = "test-" + DateTimeOffset.UtcNow.ToUnixTimeMilliseconds();
var cs = Environment.GetEnvironmentVariable("AzureServiceBus_ConnectionString");
var admin = new ServiceBusAdministrationClient(cs);
await admin.CreateQueueAsync(new CreateQueueOptions(queueName) { LockDuration = lockDuration, AutoDeleteOnIdle = TimeSpan.FromMinutes(10) });
var sw = Stopwatch.StartNew();
using var logger = new AzureEventSourceListener((ea, m) => Console.Out.WriteLineAsync($"{sw.Elapsed,-15:g} [{ea.Level,-13}] {m}"), EventLevel.Verbose);
await using var client = new ServiceBusClient(cs, new ServiceBusClientOptions { EnableCrossEntityTransactions = true });
await using var sender = client.CreateSender(queueName);
await sender.SendMessageAsync(new ServiceBusMessage());
await using var receiver = client.CreateReceiver(queueName);

var message = await receiver.ReceiveMessageAsync(); Trace.Assert(message != null);
using var cts = new CancellationTokenSource(TimeSpan.FromMinutes(11));
await RenewLockAsync(message!, cts.Token);
using var ts = new TransactionScope(TransactionScopeAsyncFlowOption.Enabled);
await sender.SendMessageAsync(new ServiceBusMessage());
await receiver.CompleteMessageAsync(message);
ts.Complete();

async Task RenewLockAsync(ServiceBusReceivedMessage message, CancellationToken cancellationToken)
{
    var renewLockReceiver = client.CreateReceiver(queueName);

    while (!cancellationToken.IsCancellationRequested)
    {
        var delayTask = await Task.Delay(renewalInterval, cancellationToken)
            .ContinueWith(t => t, TaskContinuationOptions.ExecuteSynchronously);
        if (delayTask.IsCanceled) return;
        await renewLockReceiver.RenewMessageLockAsync(message);
    }
}

Environment

  • net6
  • windows 10

ramonsmits avatar Aug 04 '22 08:08 ramonsmits

@JoshLove-msft it seems 7.8.1 caused a regression. Behavior is happening in both 7.8.1 and 7.9.0.

ramonsmits avatar Aug 04 '22 08:08 ramonsmits

Thank you for your feedback. Tagging and routing to the team member best able to assist.

jsquire avatar Aug 04 '22 12:08 jsquire

@JoshLove-msft it seems 7.8.1 caused a regression. Behavior is happening in both 7.8.1 and 7.9.0.

But you cannot reproduce in 7.8.0?

JoshLove-msft avatar Aug 04 '22 16:08 JoshLove-msft

@JoshLove-msft good question. I thought I had tested that after the release of it as that should have contained the earlier fix but not Today. Let me test that for you and I'll report it. Currently I'm assuming 7.8.0 had the fix.

ramonsmits avatar Aug 04 '22 16:08 ramonsmits

I have not been able to repro the issue on 7.9.0 using your code. Here are the relevant logs. The send link closes due to the idle timeout as expected, but the complete is still successful as the receive link remains open.

0:09:44.7111341 [Informational] test-1659636468657-6aaa3c1d-073f-4520-bfdc-d1e9a2fb1a8e: RenewLockAsync start. MessageCount = 1, LockToken = 1b0df878-9929-41e6-b76e-f8f31603fb55
0:09:44.883603  [Informational] test-1659636468657-6aaa3c1d-073f-4520-bfdc-d1e9a2fb1a8e: RenewLockAsync done. LockToken = 1b0df878-9929-41e6-b76e-f8f31603fb55
0:10:03.2098949 [Informational] Send Link Closed. Identifier: test-1659636468657-498474e3-0fda-4c84-aa30-51c9f823c6ee, linkException: Azure.Messaging.ServiceBus.ServiceBusException: The link 'G8:443458260:amqps://jolovsbdemo.servicebus.windows.n
et/-5a023aac;0:5:8' is force detached. Code: publisher(link20190829). Details: AmqpMessagePublisher.IdleTimerExpired: Idle timeout: 00:10:00. (GeneralError).
0:11:03.9606736 [Informational] test-1659636468657-498474e3-0fda-4c84-aa30-51c9f823c6ee: SendAsync start. MessageCount = 1
0:11:04.4098184 [Informational] AmqpTransactionDeclared for LocalTransactionId: b7b604f7-8929-4c97-a971-106368e16dee:1 AmqpTransactionId: txn:b418b91f925c4c23be4bb60e9aae2945__G8:24852075_G8.
0:11:04.4117466 [Informational] Creating send link for Identifier: test-1659636468657-498474e3-0fda-4c84-aa30-51c9f823c6ee.
0:11:04.4124126 [Verbose      ] test-1659636468657-498474e3-0fda-4c84-aa30-51c9f823c6ee: Requesting authorization to amqps://jolovsbdemo.servicebus.windows.net/test-1659636468657
0:11:04.587461  [Verbose      ] test-1659636468657-498474e3-0fda-4c84-aa30-51c9f823c6ee: Authorization to amqps://jolovsbdemo.servicebus.windows.net/test-1659636468657 complete. Expiration time: 08/04/2022 18:37:51
0:11:04.7621102 [Informational] Send link created for Identifier: test-1659636468657-498474e3-0fda-4c84-aa30-51c9f823c6ee.
0:11:04.941077  [Informational] test-1659636468657-498474e3-0fda-4c84-aa30-51c9f823c6ee: SendAsync done.
0:11:04.9468972 [Informational] test-1659636468657-6aaa3c1d-073f-4520-bfdc-d1e9a2fb1a8e: CompleteAsync start. MessageCount = 1, LockToken = 1b0df878-9929-41e6-b76e-f8f31603fb55
0:11:05.2011377 [Informational] test-1659636468657-6aaa3c1d-073f-4520-bfdc-d1e9a2fb1a8e: CompleteAsync done. LockToken = 1b0df878-9929-41e6-b76e-f8f31603fb55
0:11:05.4151386 [Informational] AmqpTransactionDischarged for LocalTransactionId: b7b604f7-8929-4c97-a971-106368e16dee:1 AmqpTransactionId: txn:b418b91f925c4c23be4bb60e9aae2945__G8:24852075_G8 Rollback: False.
0:11:05.4192947 [Verbose      ] Closing a ServiceBusReceiver (Identifier 'test-1659636468657-6aaa3c1d-073f-4520-bfdc-d1e9a2fb1a8e').
0:11:05.4354443 [Verbose      ] A ServiceBusReceiver has been closed (Identifier 'test-1659636468657-6aaa3c1d-073f-4520-bfdc-d1e9a2fb1a8e').
0:11:05.451127  [Verbose      ] Closing a ServiceBusSender (Identifier 'test-1659636468657-498474e3-0fda-4c84-aa30-51c9f823c6ee').
0:11:05.4552164 [Verbose      ] A ServiceBusSender has been closed (Identifier 'test-1659636468657-498474e3-0fda-4c84-aa30-51c9f823c6ee').
0:11:05.4582133 [Verbose      ] Closing a ServiceBusClient (Identifier 'jolovsbdemo.servicebus.windows.net-2eba0a28-4fc2-4097-894d-444ddcc79d0b').
0:11:05.4746853 [Verbose      ] A ServiceBusClient has been closed (Identifier 'jolovsbdemo.servicebus.windows.net-2eba0a28-4fc2-4097-894d-444ddcc79d0b').

JoshLove-msft avatar Aug 04 '22 18:08 JoshLove-msft

@JoshLove-msft It seems my repro code was incorrect and I found the difference. I've updated the description with the following:

When using transactions and invoking RenewMessageLockAsync on a different ServiceBusReceiver and processing durations are over 10 minutes then invoking CompleteMessageAsync CAN results in a MessageLockLost if that is the only message concurrently being processed.

The important question is. Is this actually a supported scenario?

  • If yes, then it should be fixed
  • If no, then invoking RenewMessageLockAsync on a different ServiceBusReceiver instance should result in an exception.

ramonsmits avatar Aug 05 '22 10:08 ramonsmits

Renewing a message with a different receiver is supported. Keeping the receive link alive past the 10 minute idle time enforced by the service will not happen if you are using a different receiver. This explains the behavior that you are seeing.

JoshLove-msft avatar Aug 05 '22 16:08 JoshLove-msft

Hi @ramonsmits. Thank you for opening this issue and giving us the opportunity to assist. We believe that this has been addressed. If you feel that further discussion is needed, please add a comment with the text “/unresolve” to remove the “issue-addressed” label and continue the conversation.

ghost avatar Aug 05 '22 19:08 ghost

@JoshLove-msft I honestly find the response a little confusing. To me it reads as if this is unsupported.

I don't think this issue can already be closed. This behavior should atleast be documented somewhere.

My interpretation is

"It is unsupported but works if done within the 10 minute idle time".

If you take into consideration that the maximum value of LockDuration is 5 minutes then obviously processing durations past 5 minutes should be renewed but it is strange that the IDLE time the limiting factor here. For example, if there is concurrent message processing then a message CAN actually be processing for a much longer duration over 10 minutes because the receiver isn't Idle.

For now we need to interpret this as unsupported as processing durations over 10 minutes must be expected.

ramonsmits avatar Aug 08 '22 07:08 ramonsmits

Renewing a lock with a different receiver is supported for an unlimited amount of time. You can keep renewing the lock as long as you want and the message will not be delivered to any receivers during this time. Independent of this, the service has behavior where it closes any idle receiver links after 10 minutes. If you are able to keep your receiver open beyond 10 minutes of receiving the initial message, then you can use it to complete the message. One way to keep the receiver open is to use it to renew message locks. Another way is to use it to receive other messages. In your case, you are letting the receiver go idle after receiving the initial message, and it gets closed after 10 minutes. It can no longer be used to complete the message once it gets closed by the service.

The reason that this works when using the same receiver to renew message locks, is that the receiver does not go idle. There is nothing special about using the same receiver to renew message locks other than the fact that it causes the receiver to not be idle.

JoshLove-msft avatar Aug 08 '22 17:08 JoshLove-msft

Reopening until we document the behavior of the 10 minute server idle timeout resulting in link close and the fact that renewing a message lock keeps the receive link open.

JoshLove-msft avatar Aug 08 '22 17:08 JoshLove-msft

Another way is to use it to receive other messages. In your case, you are letting the receiver go idle after receiving the initial message, and it gets closed after 10 minutes.

I think in our case that is actually happening. For example, allowing for a processing concurrency of 10 but then having only a single message in the queue. That means 9 receive operations are waiting for a message. If no messages are in the queue (meaning, the queue is more or less idle) this can happen too.

ramonsmits avatar Aug 09 '22 12:08 ramonsmits

Hi @ramonsmits, since you haven’t asked that we “/unresolve” the issue, we’ll close this out. If you believe further discussion is needed, please add a comment “/unresolve” to reopen the issue.

ghost avatar Aug 16 '22 16:08 ghost