azure-service-bus-java icon indicating copy to clipboard operation
azure-service-bus-java copied to clipboard

MessageSender is unable to recover from network outage

Open matyee opened this issue 6 years ago • 4 comments

I use an IMessageSender created by ClientFactory.createMessageSenderFromConnectionStringBuilder to feed a queue that I created according to the samples repository. It works well until I simulate a network outage. First, I simulated the network outage by turning off the WiFi on my laptop. I got send timeout errors for a while and when I turned the WiFi back, it kept giving timeout exceptions. I thought that turning off the WiFi is too destructive, it can cause problems in the networking stack so I went more subtle. Currently I simulate the network outage by dropping incoming packets for 45 seconds.

sudo iptables -I INPUT -i wlan0 -j DROP ; sleep 45; sudo iptables -D INPUT -i wlan0 -j DROP;

Networking is restored after 45 seconds, but I keep getting the timeouts seemingly indefinitely.

2018-11-24 19:04:20.794 TRACE 24344 --- [pool-1-thread-1] c.n.asrscenariomanager.RunQueueFeeder    : Sending message to run queue for run 1dce6166-311b-421c-aa74-4c22371983ea
2018-11-24 19:04:50.796  WARN 24344 --- [pool-4-thread-1] c.m.a.s.primitives.CoreMessageSender     : Delivery '22f5e3aa46124c0890d3fa02098caad6' to 'BasicQueue' did not receive ack from service. Throwing timeout.
2018-11-24 19:04:50.800 ERROR 24344 --- [pool-4-thread-1] c.m.a.s.primitives.CoreMessageSender     : Send timed out

com.microsoft.azure.servicebus.primitives.TimeoutException: Send operation timed out  at  2018-11-24T19:04:50.797Z[Europe/London].
	at com.microsoft.azure.servicebus.primitives.CoreMessageSender.throwSenderTimeout(CoreMessageSender.java:887) [azure-servicebus-1.2.8.jar:na]
	at com.microsoft.azure.servicebus.primitives.CoreMessageSender.access$500(CoreMessageSender.java:59) [azure-servicebus-1.2.8.jar:na]
	at com.microsoft.azure.servicebus.primitives.CoreMessageSender$3.run(CoreMessageSender.java:260) [azure-servicebus-1.2.8.jar:na]
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [na:1.8.0_91]
	at java.util.concurrent.FutureTask.run(FutureTask.java:266) [na:1.8.0_91]
	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) [na:1.8.0_91]
	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) [na:1.8.0_91]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [na:1.8.0_91]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [na:1.8.0_91]
	at java.lang.Thread.run(Thread.java:745) [na:1.8.0_91]

2018-11-24 19:04:50.803 ERROR 24344 --- [pool-3-thread-3] c.n.asrscenariomanager.RunQueueFeeder    : Failed to process run queue for run 1dce6166-311b-421c-aa74-4c22371983ea: java.util.concurrent.CompletionException: com.microsoft.azure.servicebus.primitives.TimeoutException: Send operation timed out  at  2018-11-24T19:04:50.797Z[Europe/London]., errorContext[NS: nmisbtest.servicebus.windows.net, PATH: BasicQueue, REFERENCE_ID: 42e9c9_880ac5bed8fd483f86aa5944d05aed9b_G28, LINK_CREDIT: 998]
2018-11-24 19:04:50.997 TRACE 24344 --- [pool-1-thread-1] c.n.asrscenariomanager.RunQueueFeeder    : Sending message to run queue for run 1dce6166-311b-421c-aa74-4c22371983ea
2018-11-24 19:05:20.997  WARN 24344 --- [pool-4-thread-3] c.m.a.s.primitives.CoreMessageSender     : Delivery '6d9af9eb32cb4addb9205fc8d2df292e' to 'BasicQueue' did not receive ack from service. Throwing timeout.
2018-11-24 19:05:20.998 ERROR 24344 --- [pool-4-thread-3] c.m.a.s.primitives.CoreMessageSender     : Send timed out

com.microsoft.azure.servicebus.primitives.TimeoutException: Send operation timed out  at  2018-11-24T19:05:20.998Z[Europe/London].
	at com.microsoft.azure.servicebus.primitives.CoreMessageSender.throwSenderTimeout(CoreMessageSender.java:887) [azure-servicebus-1.2.8.jar:na]
	at com.microsoft.azure.servicebus.primitives.CoreMessageSender.access$500(CoreMessageSender.java:59) [azure-servicebus-1.2.8.jar:na]
	at com.microsoft.azure.servicebus.primitives.CoreMessageSender$3.run(CoreMessageSender.java:260) [azure-servicebus-1.2.8.jar:na]
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [na:1.8.0_91]
	at java.util.concurrent.FutureTask.run(FutureTask.java:266) [na:1.8.0_91]
	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) [na:1.8.0_91]
	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) [na:1.8.0_91]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [na:1.8.0_91]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [na:1.8.0_91]
	at java.lang.Thread.run(Thread.java:745) [na:1.8.0_91]

2018-11-24 19:05:20.999 ERROR 24344 --- [pool-3-thread-3] c.n.asrscenariomanager.RunQueueFeeder    : Failed to process run queue for run 1dce6166-311b-421c-aa74-4c22371983ea: java.util.concurrent.CompletionException: com.microsoft.azure.servicebus.primitives.TimeoutException: Send operation timed out  at  2018-11-24T19:05:20.998Z[Europe/London]., errorContext[NS: nmisbtest.servicebus.windows.net, PATH: BasicQueue, REFERENCE_ID: 42e9c9_880ac5bed8fd483f86aa5944d05aed9b_G28, LINK_CREDIT: 997]
2018-11-24 19:05:21.160 TRACE 24344 --- [pool-1-thread-1] c.n.asrscenariomanager.RunQueueFeeder    : Sending message to run queue for run 1dce6166-311b-421c-aa74-4c22371983ea
2018-11-24 19:05:51.161  WARN 24344 --- [pool-4-thread-4] c.m.a.s.primitives.CoreMessageSender     : Delivery '1cee2726058d4a1d9b6966d87edc879e' to 'BasicQueue' did not receive ack from service. Throwing timeout.
2018-11-24 19:05:51.161 ERROR 24344 --- [pool-4-thread-4] c.m.a.s.primitives.CoreMessageSender     : Send timed out

com.microsoft.azure.servicebus.primitives.TimeoutException: Send operation timed out  at  2018-11-24T19:05:51.161Z[Europe/London].
	at com.microsoft.azure.servicebus.primitives.CoreMessageSender.throwSenderTimeout(CoreMessageSender.java:887) [azure-servicebus-1.2.8.jar:na]
	at com.microsoft.azure.servicebus.primitives.CoreMessageSender.access$500(CoreMessageSender.java:59) [azure-servicebus-1.2.8.jar:na]
	at com.microsoft.azure.servicebus.primitives.CoreMessageSender$3.run(CoreMessageSender.java:260) [azure-servicebus-1.2.8.jar:na]
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [na:1.8.0_91]
	at java.util.concurrent.FutureTask.run(FutureTask.java:266) [na:1.8.0_91]
	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) [na:1.8.0_91]
	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) [na:1.8.0_91]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [na:1.8.0_91]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [na:1.8.0_91]
	at java.lang.Thread.run(Thread.java:745) [na:1.8.0_91]

2018-11-24 19:05:51.162 ERROR 24344 --- [pool-3-thread-3] c.n.asrscenariomanager.RunQueueFeeder    : Failed to process run queue for run 1dce6166-311b-421c-aa74-4c22371983ea: java.util.concurrent.CompletionException: com.microsoft.azure.servicebus.primitives.TimeoutException: Send operation timed out  at  2018-11-24T19:05:51.161Z[Europe/London]., errorContext[NS: nmisbtest.servicebus.windows.net, PATH: BasicQueue, REFERENCE_ID: 42e9c9_880ac5bed8fd483f86aa5944d05aed9b_G28, LINK_CREDIT: 996]
2018-11-24 19:05:51.297 TRACE 24344 --- [pool-1-thread-1] c.n.asrscenariomanager.RunQueueFeeder    : Sending message to run queue for run 1dce6166-311b-421c-aa74-4c22371983ea
2018-11-24 19:06:21.298  WARN 24344 --- [pool-4-thread-2] c.m.a.s.primitives.CoreMessageSender     : Delivery '8dff44d9d8b540b581df5a9309ffa1bd' to 'BasicQueue' did not receive ack from service. Throwing timeout.
2018-11-24 19:06:21.299 ERROR 24344 --- [pool-4-thread-2] c.m.a.s.primitives.CoreMessageSender     : Send timed out

com.microsoft.azure.servicebus.primitives.TimeoutException: Send operation timed out  at  2018-11-24T19:06:21.299Z[Europe/London].
	at com.microsoft.azure.servicebus.primitives.CoreMessageSender.throwSenderTimeout(CoreMessageSender.java:887) [azure-servicebus-1.2.8.jar:na]
	at com.microsoft.azure.servicebus.primitives.CoreMessageSender.access$500(CoreMessageSender.java:59) [azure-servicebus-1.2.8.jar:na]
	at com.microsoft.azure.servicebus.primitives.CoreMessageSender$3.run(CoreMessageSender.java:260) [azure-servicebus-1.2.8.jar:na]
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [na:1.8.0_91]
	at java.util.concurrent.FutureTask.run(FutureTask.java:266) [na:1.8.0_91]
	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) [na:1.8.0_91]
	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) [na:1.8.0_91]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [na:1.8.0_91]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [na:1.8.0_91]
	at java.lang.Thread.run(Thread.java:745) [na:1.8.0_91]

2018-11-24 19:06:21.300 ERROR 24344 --- [pool-3-thread-3] c.n.asrscenariomanager.RunQueueFeeder    : Failed to process run queue for run 1dce6166-311b-421c-aa74-4c22371983ea: java.util.concurrent.CompletionException: com.microsoft.azure.servicebus.primitives.TimeoutException: Send operation timed out  at  2018-11-24T19:06:21.299Z[Europe/London]., errorContext[NS: nmisbtest.servicebus.windows.net, PATH: BasicQueue, REFERENCE_ID: 42e9c9_880ac5bed8fd483f86aa5944d05aed9b_G28, LINK_CREDIT: 995]

If I restart the application the message is sent right away.

Environment:

  • OS: Ubuntu 18.04
  • SDK Version 1.2.8
  • Oracle JAVA 1.8.0_91

matyee avatar Nov 24 '18 19:11 matyee

I did the network outage test again and let it running to see if something changes after a longer period of time. Unfortunately, my stdout buffer was not big enough to see all the logged messages, but I can see that there was a timeout exception at 19:19:24 and the timeout exceptions kept coming until 19:32:72 when I got the following:

2018-11-24 19:32:57.115 TRACE 25271 --- [pool-1-thread-1] c.n.asrscenariomanager.RunQueueFeeder    : Sending message to run queue for run 1dce6166-311b-421c-aa74-4c22371983ea
2018-11-24 19:33:03.376  WARN 25271 --- [5f-b43b7618d20b] c.m.a.servicebus.amqp.ConnectionHandler  : Connection.onTransportError: hostname:nmisbtest.servicebus.windows.net:5671, error:connection aborted
2018-11-24 19:33:03.376 ERROR 25271 --- [5f-b43b7618d20b] c.m.a.s.primitives.MessagingFactory      : Connection error. 'Error{condition=amqp:connection:framing-error, description='connection aborted', info=null}'
2018-11-24 19:33:03.376  INFO 25271 --- [5f-b43b7618d20b] c.m.a.s.primitives.MessagingFactory      : Closing connection to host
2018-11-24 19:33:03.378  WARN 25271 --- [5f-b43b7618d20b] c.m.a.s.primitives.RequestResponseLink   : Internal send link 'RequestResponseLink-Sender_195914_75d58a56b2254f9c870ca3d2b43b766e_G5' of requestresponselink to '$cbs' closed with error.

com.microsoft.azure.servicebus.primitives.ServiceBusException: Error{condition=amqp:connection:framing-error, description='connection aborted', info=null}
	at com.microsoft.azure.servicebus.primitives.ExceptionUtil.toException(ExceptionUtil.java:113) ~[azure-servicebus-1.2.8.jar:na]
	at com.microsoft.azure.servicebus.primitives.RequestResponseLink$InternalSender.onClose(RequestResponseLink.java:846) ~[azure-servicebus-1.2.8.jar:na]
	at com.microsoft.azure.servicebus.amqp.BaseLinkHandler.processOnClose(BaseLinkHandler.java:68) [azure-servicebus-1.2.8.jar:na]
	at com.microsoft.azure.servicebus.primitives.MessagingFactory.closeConnection(MessagingFactory.java:376) [azure-servicebus-1.2.8.jar:na]
	at com.microsoft.azure.servicebus.primitives.MessagingFactory.onConnectionError(MessagingFactory.java:301) [azure-servicebus-1.2.8.jar:na]
	at com.microsoft.azure.servicebus.amqp.ConnectionHandler.onTransportError(ConnectionHandler.java:84) [azure-servicebus-1.2.8.jar:na]
	at org.apache.qpid.proton.engine.BaseHandler.handle(BaseHandler.java:191) [proton-j-0.22.0.jar:na]
	at org.apache.qpid.proton.engine.impl.EventImpl.dispatch(EventImpl.java:108) [proton-j-0.22.0.jar:na]
	at org.apache.qpid.proton.reactor.impl.ReactorImpl.dispatch(ReactorImpl.java:309) [proton-j-0.22.0.jar:na]
	at org.apache.qpid.proton.reactor.impl.ReactorImpl.process(ReactorImpl.java:276) [proton-j-0.22.0.jar:na]
	at com.microsoft.azure.servicebus.primitives.MessagingFactory$RunReactor.run(MessagingFactory.java:484) [azure-servicebus-1.2.8.jar:na]
	at java.lang.Thread.run(Thread.java:745) [na:1.8.0_91]

2018-11-24 19:33:03.379  WARN 25271 --- [5f-b43b7618d20b] c.m.a.s.primitives.RequestResponseLink   : Completing all pending requests with exception in request response link to $cbs
2018-11-24 19:33:03.380  WARN 25271 --- [5f-b43b7618d20b] c.m.a.s.primitives.RequestResponseLink   : Internal receive link 'RequestResponseLink-Receiver_0a5164_75d58a56b2254f9c870ca3d2b43b766e_G5' of requestresponselink to '$cbs' closed with error.

com.microsoft.azure.servicebus.primitives.ServiceBusException: Error{condition=amqp:connection:framing-error, description='connection aborted', info=null}
	at com.microsoft.azure.servicebus.primitives.ExceptionUtil.toException(ExceptionUtil.java:113) ~[azure-servicebus-1.2.8.jar:na]
	at com.microsoft.azure.servicebus.primitives.RequestResponseLink$InternalReceiver.onClose(RequestResponseLink.java:637) ~[azure-servicebus-1.2.8.jar:na]
	at com.microsoft.azure.servicebus.amqp.BaseLinkHandler.processOnClose(BaseLinkHandler.java:68) [azure-servicebus-1.2.8.jar:na]
	at com.microsoft.azure.servicebus.primitives.MessagingFactory.closeConnection(MessagingFactory.java:376) [azure-servicebus-1.2.8.jar:na]
	at com.microsoft.azure.servicebus.primitives.MessagingFactory.onConnectionError(MessagingFactory.java:301) [azure-servicebus-1.2.8.jar:na]
	at com.microsoft.azure.servicebus.amqp.ConnectionHandler.onTransportError(ConnectionHandler.java:84) [azure-servicebus-1.2.8.jar:na]
	at org.apache.qpid.proton.engine.BaseHandler.handle(BaseHandler.java:191) [proton-j-0.22.0.jar:na]
	at org.apache.qpid.proton.engine.impl.EventImpl.dispatch(EventImpl.java:108) [proton-j-0.22.0.jar:na]
	at org.apache.qpid.proton.reactor.impl.ReactorImpl.dispatch(ReactorImpl.java:309) [proton-j-0.22.0.jar:na]
	at org.apache.qpid.proton.reactor.impl.ReactorImpl.process(ReactorImpl.java:276) [proton-j-0.22.0.jar:na]
	at com.microsoft.azure.servicebus.primitives.MessagingFactory$RunReactor.run(MessagingFactory.java:484) [azure-servicebus-1.2.8.jar:na]
	at java.lang.Thread.run(Thread.java:745) [na:1.8.0_91]

2018-11-24 19:33:03.380  WARN 25271 --- [5f-b43b7618d20b] c.m.a.s.primitives.RequestResponseLink   : Completing all pending requests with exception in request response link to $cbs
2018-11-24 19:33:03.382  WARN 25271 --- [5f-b43b7618d20b] c.m.a.s.primitives.CoreMessageSender     : Send link 'e9d8a6_75d58a56b2254f9c870ca3d2b43b766e_G5' to 'BasicQueue' closed. Will retry link creation after 'PT1S'.
2018-11-24 19:33:04.383  INFO 25271 --- [pool-4-thread-2] c.m.a.s.primitives.CoreMessageSender     : Recreating send link to 'BasicQueue'
2018-11-24 19:33:04.384  INFO 25271 --- [pool-3-thread-4] c.m.a.s.primitives.RequestResponseLink   : RequestResponseLink - recreating internal send and receive links to $cbs
2018-11-24 19:33:04.389  INFO 25271 --- [5f-b43b7618d20b] c.m.a.s.primitives.MessagingFactory      : Creating connection to host 'nmisbtest.servicebus.windows.net:5671'
2018-11-24 19:33:04.615  INFO 25271 --- [5f-b43b7618d20b] c.m.a.s.primitives.MessagingFactory      : Connection opened to host.
2018-11-24 19:33:04.622  INFO 25271 --- [pool-3-thread-5] c.m.a.s.primitives.RequestResponseLink   : Recreated internal links to $cbs
2018-11-24 19:33:04.645  INFO 25271 --- [5f-b43b7618d20b] c.m.a.s.primitives.CoreMessageSender     : Creating send link to 'BasicQueue'

After the connection was closed and recreated the message got sent to the queue.

matyee avatar Nov 24 '18 19:11 matyee

The same problem affects the MessageReceiver created by ClientFactory.createMessageReceiverFromConnectionStringBuilder. I use the following code for the test.

	public void receiveLoop() {
		log.trace("Next cycle");
		try {
			IMessage message = this.getMessageReceiver().receive(Duration.ofSeconds(60));
			if (message != null) {
				log.trace("Message received");
				this.messageReceiver.completeAsync(message.getLockToken());
				Thread.sleep(5000); // Sleep so I have time to cut network connection
			}
		} catch (Throwable t) {
			log.error( t.toString());
		}
		this.receiveLoop();
	}

I block the incoming packets with iptables for 45 seconds again when loop is sleeping for 5 seconds.

sudo iptables -I INPUT -i wlan0 -j DROP ; sleep 45; sudo iptables -D INPUT -i wlan0 -j DROP;

The result shows that no new messages are received until the connection is aborted and reopened.

2018-11-24 20:48:51.748 TRACE 28356 --- [onPool-worker-1] com.nminet.asrscenariorunner.RunManager  : Next cycle
2018-11-24 20:48:51.775 TRACE 28356 --- [onPool-worker-1] com.nminet.asrscenariorunner.RunManager  : Message received
2018-11-24 20:48:56.776 TRACE 28356 --- [onPool-worker-1] com.nminet.asrscenariorunner.RunManager  : Next cycle
2018-11-24 20:48:56.801 TRACE 28356 --- [onPool-worker-1] com.nminet.asrscenariorunner.RunManager  : Message received
2018-11-24 20:49:01.802 TRACE 28356 --- [onPool-worker-1] com.nminet.asrscenariorunner.RunManager  : Next cycle
2018-11-24 20:50:01.803 TRACE 28356 --- [onPool-worker-1] com.nminet.asrscenariorunner.RunManager  : Next cycle
2018-11-24 20:51:01.810 TRACE 28356 --- [onPool-worker-1] com.nminet.asrscenariorunner.RunManager  : Next cycle
2018-11-24 20:52:01.811 TRACE 28356 --- [onPool-worker-1] com.nminet.asrscenariorunner.RunManager  : Next cycle
2018-11-24 20:53:01.813 TRACE 28356 --- [onPool-worker-1] com.nminet.asrscenariorunner.RunManager  : Next cycle
2018-11-24 20:54:01.818 TRACE 28356 --- [onPool-worker-1] com.nminet.asrscenariorunner.RunManager  : Next cycle
2018-11-24 20:55:01.819 TRACE 28356 --- [onPool-worker-1] com.nminet.asrscenariorunner.RunManager  : Next cycle
2018-11-24 20:56:01.820 TRACE 28356 --- [onPool-worker-1] com.nminet.asrscenariorunner.RunManager  : Next cycle
2018-11-24 20:57:01.821 TRACE 28356 --- [onPool-worker-1] com.nminet.asrscenariorunner.RunManager  : Next cycle
2018-11-24 20:58:01.822 TRACE 28356 --- [onPool-worker-1] com.nminet.asrscenariorunner.RunManager  : Next cycle
2018-11-24 20:59:01.823 TRACE 28356 --- [onPool-worker-1] com.nminet.asrscenariorunner.RunManager  : Next cycle
2018-11-24 21:00:01.824 TRACE 28356 --- [onPool-worker-1] com.nminet.asrscenariorunner.RunManager  : Next cycle
2018-11-24 21:01:01.826 TRACE 28356 --- [onPool-worker-1] com.nminet.asrscenariorunner.RunManager  : Next cycle
2018-11-24 21:02:01.827 TRACE 28356 --- [onPool-worker-1] com.nminet.asrscenariorunner.RunManager  : Next cycle
2018-11-24 21:03:01.828 TRACE 28356 --- [onPool-worker-1] com.nminet.asrscenariorunner.RunManager  : Next cycle
2018-11-24 21:04:01.829 TRACE 28356 --- [onPool-worker-1] com.nminet.asrscenariorunner.RunManager  : Next cycle
2018-11-24 21:05:01.830 TRACE 28356 --- [onPool-worker-1] com.nminet.asrscenariorunner.RunManager  : Next cycle
2018-11-24 21:05:10.928  WARN 28356 --- [b0-655f1c4cdbae] c.m.a.servicebus.amqp.ConnectionHandler  : Connection.onTransportError: hostname:nmisbtest.servicebus.windows.net:5671, error:connection aborted
2018-11-24 21:05:10.929 ERROR 28356 --- [b0-655f1c4cdbae] c.m.a.s.primitives.MessagingFactory      : Connection error. 'Error{condition=amqp:connection:framing-error, description='connection aborted', info=null}'
2018-11-24 21:05:10.930  INFO 28356 --- [b0-655f1c4cdbae] c.m.a.s.primitives.MessagingFactory      : Closing connection to host
2018-11-24 21:05:10.940  WARN 28356 --- [b0-655f1c4cdbae] c.m.a.s.primitives.RequestResponseLink   : Internal send link 'RequestResponseLink-Sender_fcbc96_540fd470e6814d10abd814c326e7e758_G24' of requestresponselink to '$cbs' closed with error.

com.microsoft.azure.servicebus.primitives.ServiceBusException: Error{condition=amqp:connection:framing-error, description='connection aborted', info=null}
	at com.microsoft.azure.servicebus.primitives.ExceptionUtil.toException(ExceptionUtil.java:113) ~[azure-servicebus-1.2.8.jar:na]
	at com.microsoft.azure.servicebus.primitives.RequestResponseLink$InternalSender.onClose(RequestResponseLink.java:846) ~[azure-servicebus-1.2.8.jar:na]
	at com.microsoft.azure.servicebus.amqp.BaseLinkHandler.processOnClose(BaseLinkHandler.java:68) [azure-servicebus-1.2.8.jar:na]
	at com.microsoft.azure.servicebus.primitives.MessagingFactory.closeConnection(MessagingFactory.java:376) [azure-servicebus-1.2.8.jar:na]
	at com.microsoft.azure.servicebus.primitives.MessagingFactory.onConnectionError(MessagingFactory.java:301) [azure-servicebus-1.2.8.jar:na]
	at com.microsoft.azure.servicebus.amqp.ConnectionHandler.onTransportError(ConnectionHandler.java:84) [azure-servicebus-1.2.8.jar:na]
	at org.apache.qpid.proton.engine.BaseHandler.handle(BaseHandler.java:191) [proton-j-0.22.0.jar:na]
	at org.apache.qpid.proton.engine.impl.EventImpl.dispatch(EventImpl.java:108) [proton-j-0.22.0.jar:na]
	at org.apache.qpid.proton.reactor.impl.ReactorImpl.dispatch(ReactorImpl.java:309) [proton-j-0.22.0.jar:na]
	at org.apache.qpid.proton.reactor.impl.ReactorImpl.process(ReactorImpl.java:276) [proton-j-0.22.0.jar:na]
	at com.microsoft.azure.servicebus.primitives.MessagingFactory$RunReactor.run(MessagingFactory.java:484) [azure-servicebus-1.2.8.jar:na]
	at java.lang.Thread.run(Thread.java:745) [na:1.8.0_91]

2018-11-24 21:05:10.941  WARN 28356 --- [b0-655f1c4cdbae] c.m.a.s.primitives.RequestResponseLink   : Completing all pending requests with exception in request response link to $cbs
2018-11-24 21:05:10.941  WARN 28356 --- [b0-655f1c4cdbae] c.m.a.s.primitives.RequestResponseLink   : Internal receive link 'RequestResponseLink-Receiver_da0915_540fd470e6814d10abd814c326e7e758_G24' of requestresponselink to '$cbs' closed with error.

com.microsoft.azure.servicebus.primitives.ServiceBusException: Error{condition=amqp:connection:framing-error, description='connection aborted', info=null}
	at com.microsoft.azure.servicebus.primitives.ExceptionUtil.toException(ExceptionUtil.java:113) ~[azure-servicebus-1.2.8.jar:na]
	at com.microsoft.azure.servicebus.primitives.RequestResponseLink$InternalReceiver.onClose(RequestResponseLink.java:637) ~[azure-servicebus-1.2.8.jar:na]
	at com.microsoft.azure.servicebus.amqp.BaseLinkHandler.processOnClose(BaseLinkHandler.java:68) [azure-servicebus-1.2.8.jar:na]
	at com.microsoft.azure.servicebus.primitives.MessagingFactory.closeConnection(MessagingFactory.java:376) [azure-servicebus-1.2.8.jar:na]
	at com.microsoft.azure.servicebus.primitives.MessagingFactory.onConnectionError(MessagingFactory.java:301) [azure-servicebus-1.2.8.jar:na]
	at com.microsoft.azure.servicebus.amqp.ConnectionHandler.onTransportError(ConnectionHandler.java:84) [azure-servicebus-1.2.8.jar:na]
	at org.apache.qpid.proton.engine.BaseHandler.handle(BaseHandler.java:191) [proton-j-0.22.0.jar:na]
	at org.apache.qpid.proton.engine.impl.EventImpl.dispatch(EventImpl.java:108) [proton-j-0.22.0.jar:na]
	at org.apache.qpid.proton.reactor.impl.ReactorImpl.dispatch(ReactorImpl.java:309) [proton-j-0.22.0.jar:na]
	at org.apache.qpid.proton.reactor.impl.ReactorImpl.process(ReactorImpl.java:276) [proton-j-0.22.0.jar:na]
	at com.microsoft.azure.servicebus.primitives.MessagingFactory$RunReactor.run(MessagingFactory.java:484) [azure-servicebus-1.2.8.jar:na]
	at java.lang.Thread.run(Thread.java:745) [na:1.8.0_91]

2018-11-24 21:05:10.941  WARN 28356 --- [b0-655f1c4cdbae] c.m.a.s.primitives.RequestResponseLink   : Completing all pending requests with exception in request response link to $cbs
2018-11-24 21:05:11.945  INFO 28356 --- [ool-1-thread-35] c.m.a.s.primitives.RequestResponseLink   : RequestResponseLink - recreating internal send and receive links to $cbs
2018-11-24 21:05:11.948  INFO 28356 --- [b0-655f1c4cdbae] c.m.a.s.primitives.MessagingFactory      : Creating connection to host 'nmisbtest.servicebus.windows.net:5671'
2018-11-24 21:05:12.178  INFO 28356 --- [b0-655f1c4cdbae] c.m.a.s.primitives.MessagingFactory      : Connection opened to host.
2018-11-24 21:05:12.179  INFO 28356 --- [ool-1-thread-34] c.m.a.s.primitives.RequestResponseLink   : Recreated internal links to $cbs
2018-11-24 21:06:01.831 TRACE 28356 --- [onPool-worker-1] com.nminet.asrscenariorunner.RunManager  : Next cycle
2018-11-24 21:06:01.859 TRACE 28356 --- [onPool-worker-1] com.nminet.asrscenariorunner.RunManager  : Message received
2018-11-24 21:06:06.860 TRACE 28356 --- [onPool-worker-1] com.nminet.asrscenariorunner.RunManager  : Next cycle
2018-11-24 21:06:06.886 TRACE 28356 --- [onPool-worker-1] com.nminet.asrscenariorunner.RunManager  : Message received

matyee avatar Nov 24 '18 21:11 matyee

Now I see the traces. As you said, you simulated network outage at 19:19:24 and the trace showing that the connection was aborted is logged at 19:33:03. Ideally network outage should immediately trigger a "connection aborted" event. The SDK will recreate an aborted connection automatically. I have tested this.

In your case, the time lag between network outage and the generated "connection aborted" event is little over 13 minutes. That's why you are seeing timeouts for all that 13 min interval. I really can't say why there was that lag. But we saw similar issue from another customer running their application inside a container. Are you running your application inside a container? If yes, could you try the scenario by running your application outside a container?

yvgopal avatar Apr 15 '19 22:04 yvgopal

hi @yvgopal could you please tell me whether this "But we saw similar issue from another customer running their application inside a container" issue fixed? We have another customer with similar issue, customer using Java SDK run in Function App docker container .the connection disconnected for a few minutes but there isn't any exception in Service Bus and Java Exception. It cannot reconnect with Service Bus server and always waiting connect, until customer restart the Function App. Do you have any idea of it ? thank you

Scarlettliuyc avatar Jul 19 '21 02:07 Scarlettliuyc