azure-sdk-for-net
azure-sdk-for-net copied to clipboard
[BUG] Message lock renewal not working for 10+ minutes processin durationg when using different `ServiceBusReceiver` instances for `CompleteMessageAsync` and `RenewMessageLockAsync`
Library name and version
Azure.Messaging.ServiceBus 7.8.0, 7.8.1 and 7.9.0
Describe the bug
When using transactions and invoking RenewMessageLockAsync
on a different ServiceBusReceiver
and processing durations are over 10 minutes then invoking CompleteMessageAsync
CAN results in a MessageLockLost
if that is the only message concurrently being processed.
A similar bug was reported earlier. This has been fixed in 7.8.0 when invoking invoking CompleteMessageAsync
on the same receiver as CompleteMessageAsync
.
- https://github.com/Azure/azure-sdk-for-net/issues/28025
Expected behavior
The client not gets into a faulty state where lock renewal seems to succeed but when CompleteMessageAsync
is invoked that it fails with MessageLockLost
.
Actual behavior
The client can get into a faulty state where:
- any message that will be ack'd via
CompleteMessageAsync
where processing is longer then the queue lock duration will fail - any message that result in a lock renewal via
RenewMessageLockAsync
will incorrectly succeed
Reproduction Steps
var lockDuration = TimeSpan.FromMinutes(5);
var renewalInterval = lockDuration - TimeSpan.FromSeconds(10);
var queueName = "test-" + DateTimeOffset.UtcNow.ToUnixTimeMilliseconds();
var cs = Environment.GetEnvironmentVariable("AzureServiceBus_ConnectionString");
var admin = new ServiceBusAdministrationClient(cs);
await admin.CreateQueueAsync(new CreateQueueOptions(queueName) { LockDuration = lockDuration, AutoDeleteOnIdle = TimeSpan.FromMinutes(10) });
var sw = Stopwatch.StartNew();
using var logger = new AzureEventSourceListener((ea, m) => Console.Out.WriteLineAsync($"{sw.Elapsed,-15:g} [{ea.Level,-13}] {m}"), EventLevel.Verbose);
await using var client = new ServiceBusClient(cs, new ServiceBusClientOptions { EnableCrossEntityTransactions = true });
await using var sender = client.CreateSender(queueName);
await sender.SendMessageAsync(new ServiceBusMessage());
await using var receiver = client.CreateReceiver(queueName);
var message = await receiver.ReceiveMessageAsync(); Trace.Assert(message != null);
using var cts = new CancellationTokenSource(TimeSpan.FromMinutes(11));
await RenewLockAsync(message!, cts.Token);
using var ts = new TransactionScope(TransactionScopeAsyncFlowOption.Enabled);
await sender.SendMessageAsync(new ServiceBusMessage());
await receiver.CompleteMessageAsync(message);
ts.Complete();
async Task RenewLockAsync(ServiceBusReceivedMessage message, CancellationToken cancellationToken)
{
var renewLockReceiver = client.CreateReceiver(queueName);
while (!cancellationToken.IsCancellationRequested)
{
var delayTask = await Task.Delay(renewalInterval, cancellationToken)
.ContinueWith(t => t, TaskContinuationOptions.ExecuteSynchronously);
if (delayTask.IsCanceled) return;
await renewLockReceiver.RenewMessageLockAsync(message);
}
}
Environment
- net6
- windows 10
Thank you for your feedback. Tagging and routing to the team member best able to assist.
@JoshLove-msft it seems 7.8.1 caused a regression. Behavior is happening in both 7.8.1 and 7.9.0.
But you cannot reproduce in 7.8.0?
@JoshLove-msft good question. I thought I had tested that after the release of it as that should have contained the earlier fix but not Today. Let me test that for you and I'll report it. Currently I'm assuming 7.8.0 had the fix.
I have not been able to repro the issue on 7.9.0 using your code. Here are the relevant logs. The send link closes due to the idle timeout as expected, but the complete is still successful as the receive link remains open.
0:09:44.7111341 [Informational] test-1659636468657-6aaa3c1d-073f-4520-bfdc-d1e9a2fb1a8e: RenewLockAsync start. MessageCount = 1, LockToken = 1b0df878-9929-41e6-b76e-f8f31603fb55
0:09:44.883603 [Informational] test-1659636468657-6aaa3c1d-073f-4520-bfdc-d1e9a2fb1a8e: RenewLockAsync done. LockToken = 1b0df878-9929-41e6-b76e-f8f31603fb55
0:10:03.2098949 [Informational] Send Link Closed. Identifier: test-1659636468657-498474e3-0fda-4c84-aa30-51c9f823c6ee, linkException: Azure.Messaging.ServiceBus.ServiceBusException: The link 'G8:443458260:amqps://jolovsbdemo.servicebus.windows.n
et/-5a023aac;0:5:8' is force detached. Code: publisher(link20190829). Details: AmqpMessagePublisher.IdleTimerExpired: Idle timeout: 00:10:00. (GeneralError).
0:11:03.9606736 [Informational] test-1659636468657-498474e3-0fda-4c84-aa30-51c9f823c6ee: SendAsync start. MessageCount = 1
0:11:04.4098184 [Informational] AmqpTransactionDeclared for LocalTransactionId: b7b604f7-8929-4c97-a971-106368e16dee:1 AmqpTransactionId: txn:b418b91f925c4c23be4bb60e9aae2945__G8:24852075_G8.
0:11:04.4117466 [Informational] Creating send link for Identifier: test-1659636468657-498474e3-0fda-4c84-aa30-51c9f823c6ee.
0:11:04.4124126 [Verbose ] test-1659636468657-498474e3-0fda-4c84-aa30-51c9f823c6ee: Requesting authorization to amqps://jolovsbdemo.servicebus.windows.net/test-1659636468657
0:11:04.587461 [Verbose ] test-1659636468657-498474e3-0fda-4c84-aa30-51c9f823c6ee: Authorization to amqps://jolovsbdemo.servicebus.windows.net/test-1659636468657 complete. Expiration time: 08/04/2022 18:37:51
0:11:04.7621102 [Informational] Send link created for Identifier: test-1659636468657-498474e3-0fda-4c84-aa30-51c9f823c6ee.
0:11:04.941077 [Informational] test-1659636468657-498474e3-0fda-4c84-aa30-51c9f823c6ee: SendAsync done.
0:11:04.9468972 [Informational] test-1659636468657-6aaa3c1d-073f-4520-bfdc-d1e9a2fb1a8e: CompleteAsync start. MessageCount = 1, LockToken = 1b0df878-9929-41e6-b76e-f8f31603fb55
0:11:05.2011377 [Informational] test-1659636468657-6aaa3c1d-073f-4520-bfdc-d1e9a2fb1a8e: CompleteAsync done. LockToken = 1b0df878-9929-41e6-b76e-f8f31603fb55
0:11:05.4151386 [Informational] AmqpTransactionDischarged for LocalTransactionId: b7b604f7-8929-4c97-a971-106368e16dee:1 AmqpTransactionId: txn:b418b91f925c4c23be4bb60e9aae2945__G8:24852075_G8 Rollback: False.
0:11:05.4192947 [Verbose ] Closing a ServiceBusReceiver (Identifier 'test-1659636468657-6aaa3c1d-073f-4520-bfdc-d1e9a2fb1a8e').
0:11:05.4354443 [Verbose ] A ServiceBusReceiver has been closed (Identifier 'test-1659636468657-6aaa3c1d-073f-4520-bfdc-d1e9a2fb1a8e').
0:11:05.451127 [Verbose ] Closing a ServiceBusSender (Identifier 'test-1659636468657-498474e3-0fda-4c84-aa30-51c9f823c6ee').
0:11:05.4552164 [Verbose ] A ServiceBusSender has been closed (Identifier 'test-1659636468657-498474e3-0fda-4c84-aa30-51c9f823c6ee').
0:11:05.4582133 [Verbose ] Closing a ServiceBusClient (Identifier 'jolovsbdemo.servicebus.windows.net-2eba0a28-4fc2-4097-894d-444ddcc79d0b').
0:11:05.4746853 [Verbose ] A ServiceBusClient has been closed (Identifier 'jolovsbdemo.servicebus.windows.net-2eba0a28-4fc2-4097-894d-444ddcc79d0b').
@JoshLove-msft It seems my repro code was incorrect and I found the difference. I've updated the description with the following:
When using transactions and invoking
RenewMessageLockAsync
on a differentServiceBusReceiver
and processing durations are over 10 minutes then invokingCompleteMessageAsync
CAN results in aMessageLockLost
if that is the only message concurrently being processed.
The important question is. Is this actually a supported scenario?
- If yes, then it should be fixed
- If no, then invoking
RenewMessageLockAsync
on a differentServiceBusReceiver
instance should result in an exception.
Renewing a message with a different receiver is supported. Keeping the receive link alive past the 10 minute idle time enforced by the service will not happen if you are using a different receiver. This explains the behavior that you are seeing.
Hi @ramonsmits. Thank you for opening this issue and giving us the opportunity to assist. We believe that this has been addressed. If you feel that further discussion is needed, please add a comment with the text “/unresolve
” to remove the “issue-addressed” label and continue the conversation.
@JoshLove-msft I honestly find the response a little confusing. To me it reads as if this is unsupported.
I don't think this issue can already be closed. This behavior should atleast be documented somewhere.
My interpretation is
"It is unsupported but works if done within the 10 minute idle time".
If you take into consideration that the maximum value of LockDuration is 5 minutes then obviously processing durations past 5 minutes should be renewed but it is strange that the IDLE time the limiting factor here. For example, if there is concurrent message processing then a message CAN actually be processing for a much longer duration over 10 minutes because the receiver isn't Idle.
For now we need to interpret this as unsupported as processing durations over 10 minutes must be expected.
Renewing a lock with a different receiver is supported for an unlimited amount of time. You can keep renewing the lock as long as you want and the message will not be delivered to any receivers during this time. Independent of this, the service has behavior where it closes any idle receiver links after 10 minutes. If you are able to keep your receiver open beyond 10 minutes of receiving the initial message, then you can use it to complete the message. One way to keep the receiver open is to use it to renew message locks. Another way is to use it to receive other messages. In your case, you are letting the receiver go idle after receiving the initial message, and it gets closed after 10 minutes. It can no longer be used to complete the message once it gets closed by the service.
The reason that this works when using the same receiver to renew message locks, is that the receiver does not go idle. There is nothing special about using the same receiver to renew message locks other than the fact that it causes the receiver to not be idle.
Reopening until we document the behavior of the 10 minute server idle timeout resulting in link close and the fact that renewing a message lock keeps the receive link open.
Another way is to use it to receive other messages. In your case, you are letting the receiver go idle after receiving the initial message, and it gets closed after 10 minutes.
I think in our case that is actually happening. For example, allowing for a processing concurrency of 10 but then having only a single message in the queue. That means 9 receive operations are waiting for a message. If no messages are in the queue (meaning, the queue is more or less idle) this can happen too.
Hi @ramonsmits, since you haven’t asked that we “/unresolve
” the issue, we’ll close this out. If you believe further discussion is needed, please add a comment “/unresolve
” to reopen the issue.