azure-sdk-for-net
azure-sdk-for-net copied to clipboard
[BUG] RenewMessageLock fails if the topic & sub have been re-created after the clients were created.
Library name and version
Azure.Messaging.ServiceBus 7.12.0
Describe the bug
Consider the following sample.
const string topicName = "someTopic";
const string subscriptionName = "default";
static void Log(string message) => Console.WriteLine($"[{DateTime.Now:T}] {message}");
var administrationClient = new ServiceBusAdministrationClient(administrationConnectionString);
if (await administrationClient.TopicExistsAsync(topicName))
{
await administrationClient.DeleteTopicAsync(topicName);
Log($"Topic {topicName} deleted");
}
var serviceBusClient = new ServiceBusClient(sendReceiveConnectionString);
await using ServiceBusSender sender = serviceBusClient.CreateSender(topicName);
await using ServiceBusReceiver receiver = serviceBusClient.CreateReceiver(topicName, subscriptionName, new ServiceBusReceiverOptions
{
ReceiveMode = ServiceBusReceiveMode.PeekLock
});
for (int i = 0; i < 2; i++)
{
await administrationClient.CreateTopicAsync(new CreateTopicOptions(topicName)
{
AutoDeleteOnIdle = TimeSpan.FromMinutes(10)
});
Log($"Topic {topicName} created");
await administrationClient.CreateSubscriptionAsync(topicName, subscriptionName);
Log($"Subscription {topicName}/{subscriptionName} created");
var messageBody = $"Message {i}";
await sender.SendMessageAsync(new ServiceBusMessage(messageBody));
Log($"{messageBody} sent");
var message = await receiver.ReceiveMessageAsync(TimeSpan.FromMinutes(1));
Log($"{message.Body} received. MessageId = {message.MessageId}, LockToken = {message.LockToken}, LockedUntil = {message.LockedUntil}");
await Task.Delay(2000); //to demonstrate that the message lock's extension takes place
try
{
await receiver.RenewMessageLockAsync(message);
Log($"{message.Body}'s lock extended to: {message.LockedUntil}");
}
catch (Exception e)
{
Log($"Failed to extend the lock for message, MessageId = {message.MessageId}. Exception: {e}");
}
await receiver.CompleteMessageAsync(message);
Log($"{message.Body} completed");
await administrationClient.DeleteTopicAsync(topicName);
Log($"Topic {topicName} deleted");
}
Expected behavior
Both messages have been sent, peekLocked, have their lock renewed, completed.
Actual behavior
Please mind that even though that the RenewLock failed, the following Complete succeeded. Somehow send, peekLock & complete can handle the fact that the topic & sub have been re-created. Only renewLock fails.
[16:44:29] Topic someTopic created
[16:44:29] Subscription someTopic/default created
[16:44:30] Message 0 sent
[16:44:30] Message 0 received. MessageId = f860aca47828431c8acb706edf689478, LockToken = a25fbb83-4f38-4c45-89cc-5792ea90774b, LockedUntil = 10.02.2023 15:45:31 +00:00
[16:44:32] Message 0's lock extended to: 10.02.2023 15:45:33 +00:00
[16:44:32] Message 0 completed
[16:44:33] Topic someTopic deleted
[16:44:34] Topic someTopic created
[16:44:34] Subscription someTopic/default created
[16:44:34] Message 1 sent
[16:44:34] Message 1 received. MessageId = b670d906bbcd47b088e4ddba04546fad, LockToken = 0fcf058d-2316-43b4-b7b7-0e5c35c8fb75, LockedUntil = 10.02.2023 15:45:35 +00:00
[16:44:48] Failed to extend the lock for message, MessageId = b670d906bbcd47b088e4ddba04546fad. Exception: Azure.Messaging.ServiceBus.ServiceBusException: The link 'G18:RR:01234567890:MYNAMESPACENAME:Topic:sometopic|default$management:28:sender' is force detached. Code: ServerError. Details: AmqpControlProtocolClient.Fault. TrackingId:c5974cbf-91bc-4445-9052-2ecdd8aec1e5_B19, SystemTracker:MYNAMESPACENAME:Topic:sometopic|default, Timestamp:2023-02-10T15:44:49 Reference:c29df406-c7ed-4a99-85a2-071212499bea, TrackingId:a628d6af-d546-480e-9ebd-1557b9128bdd_G18, SystemTracker:NoSystemTracker, Timestamp:2023-02-10T15:44:49 (GeneralError). For troubleshooting information, see https://aka.ms/azsdk/net/servicebus/exceptions/troubleshoot.
at Azure.Messaging.ServiceBus.Amqp.AmqpReceiver.RenewMessageLockInternalAsync(Guid lockToken, TimeSpan timeout)
at Azure.Messaging.ServiceBus.Amqp.AmqpReceiver.<>c.<<RenewMessageLockAsync>b__64_0>d.MoveNext()
--- End of stack trace from previous location ---
at Azure.Messaging.ServiceBus.ServiceBusRetryPolicy.RunOperation[T1,TResult](Func4 operation, T1 t1, TransportConnectionScope scope, CancellationToken cancellationToken, Boolean logTimeoutRetriesAsVerbose) at Azure.Messaging.ServiceBus.ServiceBusRetryPolicy.RunOperation[T1,TResult](Func
4 operation, T1 t1, TransportConnectionScope scope, CancellationToken cancellationToken, Boolean logTimeoutRetriesAsVerbose)
at Azure.Messaging.ServiceBus.Amqp.AmqpReceiver.RenewMessageLockAsync(Guid lockToken, CancellationToken cancellationToken)
at Azure.Messaging.ServiceBus.ServiceBusReceiver.RenewMessageLockAsync(Guid lockToken, CancellationToken cancellationToken)
at Azure.Messaging.ServiceBus.ServiceBusReceiver.RenewMessageLockAsync(ServiceBusReceivedMessage message, CancellationToken cancellationToken)
at Program.<Main>$(String[] args) in C:\Users\MyName\Source\Repos\ServiceBusRenewLockRepro\Program.cs:line 56
[16:44:48] Message 1 completed
[16:44:48] Topic someTopic deleted
Reproduction Steps
Run the sample.
Environment
Sample running against .NET 6, Windows 10 x64. Also observed in Azure App Service hosting a .NET 4.8 app.
Thank you for your feedback. Tagging and routing to the team member best able to assist.
This appears to be a difference in the service behavior when it comes to send/receive links and the management link when an entity is deleted. For send and receive links, the service lets the client know that the links should be closed and the client does so and then reopens them on the next operation. For management links (what the renew lock operation uses), no such communication comes back from the service so the client attempts to communicate on the same link which ends up causing an error on the service. I've reached out to the service team to see if they have more context about why it works this way.
The observation is correct. Send and receive links are closed when an entity is deleted, but request-response links are not closed. So operations on request-response links like RenwLock fail with an exception, and then the link is closed. I don't see anything wrong with the behavior, it, in fact, helps to avoid unnecessary errors during service upgrade.
This is not a normal use case. Senders and receivers recovering in this case is just a side effect, not the intended effect. Senders and receivers recover to handle any network glitches or service upgrades etc. Not this case of deleting an entity and recreating another entity with the same name. It is a new entity, ideally senders and receivers created for an old entity shouldn't work with the new entity.
This behavior is observed because of this peculiar test. I see no reason to fix it.
Hi @dzendras. Thank you for opening this issue and giving us the opportunity to assist. We believe that this has been addressed. If you feel that further discussion is needed, please add a comment with the text “/unresolve
” to remove the “issue-addressed” label and continue the conversation.
Thanks @yvgopal. @dzendras, this is just a difference in the way the service implements handling of a deleted entity between the different types of links. Because the sender/receiver links are closed, the client automatically recreates the links, whereas the same is not true for renewing message locks. As this is not a common scenario that users would need to handle in their applications (and it isn't clear that an application would even want to process the new entity with the same name in the same way), I'm going to mark this as addressed. Please let us know if you have further questions on this behavior.
/unresolve
@JoshLove-msft @yvgopal Thank you for your explanations. Sorry to have not responded earlier.
Let me show you my point of view on that case. I see this behaviour as inconsistent. I would totally understand if send & receive failed after deleting the topic. Then it would be clear that the sender & receiver instances were "broken" and needed to be re-created. The fact that some operations succeed and some fail on the same client instance is counterintuitive and inconsistent. I am in no position to tell you whether it is a service or an SDK issue. From my point of view, I use a single instance provided by the SDK which gives me abstractions that allow me to be unaware of whether operation X is implemented using request-response links and operation Y using some other links.
@yvgopal With all respect, I do not think that it is sensible to make assessments whether the scenario is peculiar or not. It is a series of valid operations that is not served properly. I could not find any piece of documentation that described it as unsupported. If there is any, please send me a link. Obviously I could not paste the actual product code, so that is why I prepared this repro code. This represents a valid business use case I cannot disclose.
We have reassigned an engineer to look further into this issue. We will update this issue when we have more information.
@dzendras I repeat this use case is not common. The test works fine in a premium messaging namespace, but fails in a standard messaging namespace. You can also create a queue with the same name for the second iteration, after deleting the topic in the first iteration. Then the test will fail in a different way. The fact is that the recreated topic is not the same topic, even though it has the same name.
I don't see it as a bug. One can argue that it causes some inconvenience in cases like the one you showed here. There are trade- offs to removing that inconvenience. I don't believe fixing it is worth the trade-offs. So we are not going to fix it anytime soon.
If multiple or many customers are experiencing this inconvenience, then we will consider fixing it. For now, I am closing this issue.