[BUG] two listener outages on record, no root cause discovered
Describe the bug A process listening to a topic stops receiving messages. We don't know the root cause or the time. Our customer's discover the problem first because the application begins to misbehave.
Restarting the process clears the problem. Once restarted, messages can be received.
How can I catch the error when it happens so I can restart the service before the customer notices?
How can I collect information that can narrow the problem description?
When azure service bus is upgraded, can we get an announcement?
We are using premium service bus.
Exception or Stack Trace None
To Reproduce Unknown.
Code Snippet Unknown.
Expected behavior Listener keeps getting messages without restarting the process.
Screenshots NA
Setup (please complete the following information):
- OS: [e.g. iOS] Azure service bus broker, linux/java container.
- IDE: [e.g. IntelliJ] intellij
- Library/Libraries: spring-cloud-azure-starter-servicebus-jms, 5.14.0
- Java version: [e.g. 8] 17
- App Server/Environment: Spring boot 3.3.x
- Frameworks:Spring boot
Additional context Add any other context about the problem here.
Information Checklist Kindly make sure that you have added all the following information above and checkoff the required fields otherwise we will treat the issuer as an incomplete report
- [ ] Bug Description Added
- [ ] Repro Steps Added
- [ ] Setup information Added
@moarychan @netyyyy @rujche @saragluna
Thank you for your feedback. Tagging and routing to the team member best able to assist.
Thanks for reaching out, could you help provide a minimal project for us to reproduce this issue?
I don't know how to reproduce the problem.
The jms clients we run are long term services. They run for weeks between scheduled maintenance restarts. We've been in production for 12 months. The problem had been noticed only twice.
We have about 10 topics. We observe that only one of them is affected.
We have a very light load at this time. We experience frequent idle connection closings. JMS is restarting connections successfully.
However, the connection closing exceptions are the only disruption reported into the logs by the jms layer.
We don't have a sample application to demonstrate the problem.
Could you help provide your configuration and pom file then?
The service in question subscribes to only 8 different topics.
- Some topics are from other services in our deployment. Some topics are used to propagate status only among the instances of this service. If one service has a state change, it must be shared with the other instances.
- Other topics receive messages from other distinct services.
We have observed message loss affecting the former topics: the topics to propagate status to other instances. So I will focus on these first. These topics are meant for 1 -> many broadcasts of messages. Each instance creates a subscription using the azure-sdk and with a unique name. (We don't have any other direct dependency on the sdk. We use JMS library.)
` private static SubscriptionProperties createSubscription(String topicName, String connectionString) { ServiceBusAdministrationClient serviceBusAdministrationClient = new ServiceBusAdministrationClientBuilder() .connectionString(connectionString) .buildClient();
CreateSubscriptionOptions subscriptionOptions =
new CreateSubscriptionOptions().setDefaultMessageTimeToLive(Duration.ofSeconds(16));
String key = UUID.randomUUID().toString();
SubscriptionProperties createdSubscription =
serviceBusAdministrationClient.createSubscription(topicName, key, subscriptionOptions);
log.info(
"Created subscription {} for topic {}",
createdSubscription.getSubscriptionName(),
createdSubscription.getTopicName());
return createdSubscription;
} `
All our listeners for all topics have a the same configuration for their JmsListenerContainerFactory:
`
DefaultJmsListenerContainerFactory listenerContainerFactory =
new DefaultJmsListenerContainerFactory();
listenerContainerFactory.setConnectionFactory(connectionFactory);
listenerContainerFactory.setSubscriptionDurable(true);
listenerContainerFactory.setSessionTransacted(true);
listenerContainerFactory.setSessionAcknowledgeMode(Session.CLIENT_ACKNOWLEDGE);
`
The publisher and the subscriber are the same service but different replicas. We are using pub-sub to propagate events among replicas. We currently have two replicas in each availability zone for total of 4 replicas running and subscribed, but only two replicas in the same zone are active. The other two are passive and ignored in our outage scenario.
The lost messages are detected within the active availability zones. We observe the message outage between the publisher and the subscriber in the same zone.
We want a one-to-many pub-sub configuration. All subscribers get a copy of any published message. Some of our developers believe service bus requires a unique subscription name for all subscribers. Others believe service bus would also work if all the subscribers used the same subscription name. The documentation I've found is ambiguous on the question. For now, each replica subscribes with a unique subscription name. Are we using the right subscription name for the scenario?
We also rely on the assumption that an exception would be raised if the client library could not publish a message. We do not see another way to detect a publish error, and we do not see this exception from the client library.
We don't see any errors or issues reported by the azure admin portal. Please advise if there is something specific we can look for.
We are running at premium tier.
We will close this issue as it has been open for a while. If you have any further questions or need assistance, please feel free to reopen it. Thank you!