azure-sdk-for-js
azure-sdk-for-js copied to clipboard
Service Bus v7.4.0(the latest version) hangs receiving messages
- Package Name: azure/service-bus
- Package Version: 7.4.0
- Operating system:linux debian
- [ ] nodejs
- version: 14.15.0
- [ ] browser
- name/version:
- [ ] typescript
- version:
- Is the bug related to documentation in
- [ ] README.md
- [ ] source code documentation
- [ ] SDK API docs on https://docs.microsoft.com
Describe the bug receiveMessages hang
To Reproduce Steps to reproduce the behavior:
code like:
this.receiver = this.sbClient.createReceiver(this.queueName, {
receiveMode: 'peekLock',
maxAutoLockRenewalDurationInMs: 0 //set this to 0 to disable auto lock renew! because it's not work as expected!
});;
while (true) {
try {
// NOTE: asking for 20 messages does not guarantee that we will return
// all 20 at once so we must loop until we get all the messages we expected.
/**
* @throws Error if the underlying connection, client or receiver is closed.
* @throws Error if current receiver is already in state of receiving messages.
* @throws `ServiceBusError` if the service returns an error while receiving messages.
*/
const messages = await this.receiver.receiveMessages(messageCount, {
maxWaitTimeInMs: 2 * 1000,
});
if (!messages.length) {
logger.debug('queue is empty');
continue;
}
messages.map(msg=>onMessageHandler(msg));
} catch (err) {
await onErrorHandler(new QueueError(err));
}
}
Expected behavior A clear and concise description of what you expected to happen.
Screenshots If applicable, add screenshots to help explain your problem.
Additional context Add any other context about the problem here.
Thanks for reporting @winterAndvelvet
While we look to replicate the issue, I noticed that you are calling your own message and error handlers. If you already have these handlers set up, can you share why you chose to use the receiveMessages() method in a loop instead of the subscribe() method that will take your handlers?
For concurrency control eaily,we choosed the polling message style.In the first few hours,it worked well,but get message hang when queue incoming message increase to 4k.I don't konw if hang is related with message incoming count in queue.In addition,i tied to call receiveMessages with timeout.But when receiveMessages timeout ,it throw "The receiver is already receiving messages" at the next loop step.This also can proved receiveMessages is really hang. Code Like:
async startGetMessageLoop(onMessageHandler, onErrorHandler, options){
let {trafficController, getMsgTimer}= options;
while (true) {
try {
if(global.isReceiveExit){
logger.warn('receive sigterm,break rec msg loop');
break;
}
await trafficController.waitTrafficBlockLeave();
let messageCount = trafficController.nextTrafficWindow();
logger.debug(`next traffic window:${messageCount}`);
// NOTE: asking for 20 messages does not guarantee that we will return
// all 20 at once so we must loop until we get all the messages we expected.
/**
* @throws Error if the underlying connection, client or receiver is closed.
* @throws Error if current receiver is already in state of receiving messages.
* @throws `ServiceBusError` if the service returns an error while receiving messages.
*/
//bug:https://github.com/Azure/azure-sdk-for-js/issues/20274
const messages = await Promise.race([this.receiver.receiveMessages(messageCount, {
maxWaitTimeInMs: 2 * 1000,
}),new Promise((_, reject) => {
setTimeout(() =>{
reject(new Error('receive timeout'))}, 5000);
})]);
if (!messages.length) {
logger.debug('queue is empty');
continue;
}
messages.map(msg=>onMessageHandler(msg));
} catch (err) {
if(err.message === 'receive timeout'){
logger.error(`receive msg timeout,continue`);
continue;
}
await onErrorHandler(new QueueError(err));
}
await sleep(getMsgTimer);
}
}

By the way,when use:
await this.receiver.receiveMessages(messageCount, {
maxWaitTimeInMs: 2 * 1000,
})
also get current receiver is already in state of receiving messages exception,the maxWaitTimeInMs not work?
@winterAndvelvet thank you for the detailed information! I will look into this.
@jeremymeng thank you
@winterAndvelvet You are right that the "isReceivingMessages" flag is only cleared at the end of the receiveMessages() call. I would expect maxWaitTimeInMs to work but from the above code and logs I could not see what went wrong. Is it possible that you enable more logging and see if anything unusual/erroneous when the issue starts happening? Please refer to the troubleshooting section for more information on getting debug logs:
https://github.com/Azure/azure-sdk-for-js/blob/main/sdk/servicebus/service-bus/README.md#troubleshooting
It is probably useful to start with this configuration:
export DEBUG=azure*,rhea*,-rhea:raw,-rhea:message,-azure:core-amqp:datatransformer
@winterAndvelvet while we are figuring out the root cause of the issue, one workaround is to pass an abort signal to the receiveMessages() call then handle the error:
import { AbortController } from "@azure/abort-controller";
// ...
const messages = await this.receiver.receiveMessages(messageCount, {
maxWaitTimeInMs: 3 * 1000,
abortSignal: AbortController.timeout(2 * 1000),
});
// ...
} catch (err) {
if (err.name === "AbortError") {
console.log("2 seconds passed, abort. will try receiving again");
continue;
}
await onErrorHandler(new Error(err));
}
@winterAndvelvet have you got any chance to enable more detailed logging or try out the workaround using abort signal?
Hi, we're sending this friendly reminder because we haven't heard back from you in a while. We need more information about this issue to help address it. Please be sure to give us your input within the next 7 days. If we don't hear back from you within 14 days of this comment the issue will be automatically closed. Thank you!
Hi, we're sending this friendly reminder because we haven't heard back from you in a while. We need more information about this issue to help address it. Please be sure to give us your input within the next 7 days. If we don't hear back from you within 14 days of this comment the issue will be automatically closed. Thank you!
Hello. I'm having this same issue. the receiver.receiveMessages freezes if it tries to receive more than 2000 messages at once, and the maxWaitTimeInMs is not respected. Therefore it hangs and requires a restart. It works fine when receiving less than 2000 messages.
it also freezes if I batch process 200 messages at a time, until 2000 are requested. Once 2000 are locked, it will not receive any more messages from the queue (the 2000 limit is approximately when the issue occurs, might be more or less in reality)
Receiving messages...
Received 250 messages
Not enough messages to process yet, waiting for 4446 total messages
Processing all messages took: 0.493ms
Received 250 messages
Not enough messages to process yet, waiting for 4446 total messages
Processing all messages took: 0.339ms
Received 250 messages
Not enough messages to process yet, waiting for 4446 total messages
Processing all messages took: 0.357ms
Received 250 messages
Not enough messages to process yet, waiting for 4446 total messages
Processing all messages took: 0.324ms
Received 250 messages
Not enough messages to process yet, waiting for 4446 total messages
Processing all messages took: 0.384ms
Received 250 messages
Not enough messages to process yet, waiting for 4446 total messages
Processing all messages took: 0.319ms
Received 250 messages
Not enough messages to process yet, waiting for 4446 total messages
Processing all messages took: 0.336ms
Received 250 messages
Not enough messages to process yet, waiting for 4446 total messages
Processing all messages took: 0.342ms
Error [AbortError]: The operation was aborted.
at new AbortError
Finally called
Here you can see that it receives the messages in a loop perfectly fine, until exactly 2000 have been received. At this point, the receiver.receiveMessages hangs and the abort controller is called.
Code:
import { AbortController } from "@azure/abort-controller";
let messageBatchSize = 250;
let messagesRequiredToStartProcessing = 4446;
let allMessages = [];
try {
const messages = await receiver.receiveMessages(messageBatchSize, {
maxWaitTimeInMs: 0.5 * 1000,
abortSignal: AbortController.timeout(10 * 1000),
});
console.time(`Processing all messages took`);
console.log(`Received ${messages.length} messages`);
allMessages.push(...messages);
if (allMessages.length < messagesRequiredToStartProcessing) {
// Don't process until exactly `messageBatchSize` messages are received
console.log(`Not enough messages to process yet, waiting for ${messagesRequiredToStartProcessing} total messages`);
console.timeEnd(`Processing all messages took`);
continue;
}
} catch (error) {
console.log("Assigned Seating Queue Error: ", error)
} {
console.log('Assigned Seating Queue: Finally called');
}
@CodeJuggernaut Thank you very much for the logs and repro code. I will have another look this week. Is it possible for you to enable more details logging and share the logs when this issue happens by setting the following environment variable before running the test?
export DEBUG=azure*,rhea*,-rhea:raw,-rhea:message,-azure:core-amqp:datatransformer
Hi, thanks for the fast reply. I enabled that debug logs and its printing out 10,000s plus lines of debug logs... which ones should I share?
Is there a secure way to share these logs with you?
@CodeJuggernaut It would be helpful to see logs around when the test appears stuck then the call is aborted. Let me think about the secure way of sharing.
@CodeJuggernaut it looks like there's some internal limitation of around 2000 messages that we can handle before they are processed (complete/abandan/defer/deadleter). I will try to find out more but as a workaround, can you lower the number to 2000, and ensure they are processed before receiving the next 2000?
let messagesRequiredToStartProcessing = 2000;
Hi, we're sending this friendly reminder because we haven't heard back from you in a while. We need more information about this issue to help address it. Please be sure to give us your input within the next 7 days. If we don't hear back from you within 14 days of this comment the issue will be automatically closed. Thank you!
@CodeJuggernaut just an update that this is actually a known issue https://github.com/Azure/azure-sdk-for-js/issues/11633. While we are working on a solution. I wonder whether the above workaround would work for you?
Hi, we're sending this friendly reminder because we haven't heard back from you in a while. We need more information about this issue to help address it. Please be sure to give us your input within the next 7 days. If we don't hear back from you within 14 days of this comment the issue will be automatically closed. Thank you!
Hello @jeremymeng
the workaround that pass an abort signal to the receiveMessages() call not work. Afterthe receiveMessages() call abort, the receiveMessages() call hang again soon, as shown in the following figure. I feel I need to create a new receiver.

@zfyseu is it possible for you to try the other mentioned workaround of processing at most 2000 messages in a batch?
@jeremymeng My receiver only receives one message at a time.
@zfyseu ah sorry I thought you were having the batching issue. Is it possible for you to turn on more detailed logging and share the logs when the SDK stops receiving messages?
export DEBUG=azure*,rhea*,-rhea:raw,-rhea:message,-azure:core-amqp:datatransformer
@jeremymeng OK, I'm a little busy these days. I'll collect the logs later.
Hi, we're sending this friendly reminder because we haven't heard back from you in a while. We need more information about this issue to help address it. Please be sure to give us your input within the next 7 days. If we don't hear back from you within 14 days of this comment the issue will be automatically closed. Thank you!
Hi @jeremymeng , The time of the receiver hang is around 2022-10-26T17:31:40 UTC time. I take a screenshot of some logs near the time.








@zfyseu Thank you for sharing the log. Just to confirm, this is with the abortSignal workaround? Do you see my logs for "[connection-8]" afterwards?
@jeremymeng Yes, this is with the abortSignal workaround. And the AbortSignal is 5s after maxWaitTimeInMs.

Hi, we're sending this friendly reminder because we haven't heard back from you in a while. We need more information about this issue to help address it. Please be sure to give us your input within the next 7 days. If we don't hear back from you within 14 days of this comment the issue will be automatically closed. Thank you!
Hi, we're sending this friendly reminder because we haven't heard back from you in a while. We need more information about this issue to help address it. Please be sure to give us your input within the next 7 days. If we don't hear back from you within 14 days of this comment the issue will be automatically closed. Thank you!
Hi, we're sending this friendly reminder because we haven't heard back from you in a while. We need more information about this issue to help address it. Please be sure to give us your input within the next 7 days. If we don't hear back from you within 14 days of this comment the issue will be automatically closed. Thank you!