very weird problem (possibly related to SSL connections handling in reactive RabbitMQ client)
Expected Behavior
Application should be capable of receiving messages reliably under load, even over AMQPS
Actual Behavior
We have a case when reactive RabbitMQ client throws exceptions, loses connection and is basically unusable for prolonged periods of time
Steps to Reproduce
Unfortunately, I can not yet produce a reliably reproducing case that would not depend on our in-house RabbitMQ installation, This is what I can share though:
We have a problem which I only could reproduce against our in-house RabbitMQ server, and even then only under some specific circumstances.
the gist of the problem is that occassionally, when our application (WebFlux/Reactive-RabbitMQ) is publishing and receiving many messages in parallel (think more then 20 parallel consumers, I use k6 --vus 50 to create load), it starts throwing exceptions like this:
com.rabbitmq.client.AlreadyClosedException: connection is already closed due to connection error; cause: javax.net.ssl.SSLException: Tag mismatch!
at com.rabbitmq.client.impl.AMQConnection.ensureIsOpen(AMQConnection.java:175) ~[amqp-client-5.9.0.jar:5.9.0]
Suppressed: reactor.core.publisher.FluxOnAssembly$OnAssemblyException:
and
javax.net.ssl.SSLException: Buffer closed
at com.rabbitmq.client.impl.nio.SslEngineHelper.write(SslEngineHelper.java:176) ~[amqp-client-5.9.0.jar:5.9.0]
at com.rabbitmq.client.impl.nio.SslEngineByteBufferOutputStream.doFlush(SslEngineByteBufferOutputStream.java:59) ~[amqp-client-5.9.0.jar:5.9.0]
at com.rabbitmq.client.impl.nio.SslEngineByteBufferOutputStream.flush(SslEngineByteBufferOutputStream.java:53) ~[amqp-client-5.9.0.jar:5.9.0]
at java.base/java.io.DataOutputStream.flush(DataOutputStream.java:123) ~[na:na]
at com.rabbitmq.client.impl.nio.NioLoop.run(NioLoop.java:244) ~[amqp-client-5.9.0.jar:5.9.0]
at java.base/java.lang.Thread.run(Thread.java:834) ~[na:na]
and
reactor.rabbitmq.RabbitFluxException: Error while setting publisher confirms on channel
at reactor.rabbitmq.Sender.lambda$sendWithTypedPublishConfirms$6(Sender.java:282) ~[reactor-rabbitmq-1.4.4.RELEASE.jar:1.4.4.RELEASE]
Suppressed: reactor.core.publisher.FluxOnAssembly$OnAssemblyException:
etc. There were certainly some other ones, can't find in the logs right now. When I run exactly the same scenario with "traditional" stack (Spring WebMVC+Spring AMQP), not a single exception of this sort occurs.
I could collect whichever information is necessary though, if I would 've been suggested appropriate tools or suggestions on how to pin it down
(it was first shared on Gitter, but no reaction followed, so I decided to raise an issue and see if anyone would want to chime in)
Possible Solution
Your Environment
- Reactive-RabbitMQ: 1.4.4.RELEASE
- Reactor version(s) used: 3.3.5.RELEASE
- Other relevant libraries versions (eg.
netty, ...): 4.1.49.Final - JVM version (
java -version): 11.0.2 (same with 8.0.172) - OS and version (eg
uname -a): Win10 x64
Please provide the broker and Erlang versions, as well the broker logs when the problems occurs. It seems the connection is closed for some reason, so the broker logs will be helpful to find out more.
Please /cc me on Gitter for any question related to Reactor RabbitMQ.
RabbitMQ 3.8.3, Erlang 22.2.1
logs will come later
more exception from the application:
avax.net.ssl|ERROR|77|rabbitmq-nio|2020-09-17 11:37:26.908 EEST|TransportContext.java:313|Fatal (UNEXPECTED_MESSAGE): Input record too big: max = 16709 len = 43125 (
"throwable" : {
javax.net.ssl.SSLProtocolException: Input record too big: max = 16709 len = 43125
at java.base/sun.security.ssl.SSLEngineImpl.readRecord(SSLEngineImpl.java:591)
at java.base/sun.security.ssl.SSLEngineImpl.unwrap(SSLEngineImpl.java:443)
at java.base/sun.security.ssl.SSLEngineImpl.unwrap(SSLEngineImpl.java:422)
at java.base/javax.net.ssl.SSLEngine.unwrap(SSLEngine.java:634)
at com.rabbitmq.client.impl.nio.SslEngineFrameBuilder.somethingToRead(SslEngineFrameBuilder.java:49)
at com.rabbitmq.client.impl.nio.FrameBuilder.readFrame(FrameBuilder.java:71)
at com.rabbitmq.client.impl.nio.NioLoop.run(NioLoop.java:156)
at java.base/java.lang.Thread.run(Thread.java:834)}
)
javax.net.ssl|ALL|77|rabbitmq-nio|2020-09-17 11:37:26.909 EEST|SSLSessionImpl.java:763|Invalidated session: Session(1600329088114|TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384)
2020-09-17 11:37:26.917 WARN 46868 --- [ rabbitmq-nio] com.rabbitmq.client.impl.nio.NioLoop : Error during reading frames
javax.net.ssl.SSLProtocolException: Input record too big: max = 16709 len = 43125
at java.base/sun.security.ssl.SSLEngineImpl.readRecord(SSLEngineImpl.java:591) ~[na:na]
at java.base/sun.security.ssl.SSLEngineImpl.unwrap(SSLEngineImpl.java:443) ~[na:na]
at java.base/sun.security.ssl.SSLEngineImpl.unwrap(SSLEngineImpl.java:422) ~[na:na]
at java.base/javax.net.ssl.SSLEngine.unwrap(SSLEngine.java:634) ~[na:na]
at com.rabbitmq.client.impl.nio.SslEngineFrameBuilder.somethingToRead(SslEngineFrameBuilder.java:49) ~[amqp-client-5.9.0.jar:5.9.0]
at com.rabbitmq.client.impl.nio.FrameBuilder.readFrame(FrameBuilder.java:71) ~[amqp-client-5.9.0.jar:5.9.0]
at com.rabbitmq.client.impl.nio.NioLoop.run(NioLoop.java:156) ~[amqp-client-5.9.0.jar:5.9.0]
at java.base/java.lang.Thread.run(Thread.java:834) ~[na:na]
javax.net.ssl|WARNING|77|rabbitmq-nio|2020-09-17 11:37:26.919 EEST|SSLEngineOutputRecord.java:168|outbound has closed, ignore outbound application data
javax.net.ssl|DEBUG|77|rabbitmq-nio|2020-09-17 11:37:26.919 EEST|SSLEngineOutputRecord.java:507|WRITE: TLS12 alert, length = 2
javax.net.ssl|DEBUG|77|rabbitmq-nio|2020-09-17 11:37:26.919 EEST|SSLCipher.java:1727|Plaintext before ENCRYPTION (
0000: 02 0A ..
)
javax.net.ssl|DEBUG|77|rabbitmq-nio|2020-09-17 11:37:26.919 EEST|SSLEngineOutputRecord.java:525|Raw write (
0000: 15 03 03 00 1A 00 00 00 00 00 00 01 A3 CE 79 97 ..............y.
0010: F4 7D 9E 13 A9 89 38 85 31 4E C8 ED E2 E2 97 ......8.1N.....
)
Could you please try with the blocking IO mode as well?
you mean, without using .useNio on ConnectionFactory?
Yes.
hmm, with that commented out, the errors are gone. Uncommented again - here they go. what gives? Should I still extract logs ?
the only things related to my app around the time when problem was first manifested are:
2020-09-17 08:37:32.498 [info] <0.11968.3742> accepting AMQP connection <0.11968.3742> (10.61.4.8:49663 -> 10.61.41.67:5671)
2020-09-17 08:37:32.535 [info] <0.11968.3742> Connection <0.11968.3742> (10.61.4.8:49663 -> 10.61.41.67:5671) has a client-provided name: hello-esb
2020-09-17 08:37:32.569 [info] <0.11968.3742> connection <0.11968.3742> (10.61.4.8:49663 -> 10.61.41.67:5671 - hello-esb): user 'hello-app' authenticated and granted access to vhost 'esb-test'
2020-09-17 08:37:27.348 [warning] <0.17059.5998> closing AMQP connection <0.17059.5998> (10.61.4.9:59276 -> 10.61.41.66:5671 - hello-esb, vhost: 'esb-test', user: 'hello-app'):
This might be a bug in the Java client NIO connector. Contrary to blocking IO, TLS is not transparent with NIO, it requires crypting/decrypting before leaving/reaching the application code.
It would be great to be able to reproduce independently from Reactor RabbitMQ.
As a workaround you can switch to the blocking IO mode for now. This should not affect your application unless you use many connections (understand 100s or 1000s of connections in your client applications).
that's cool!! every single one example in the documentation uses that, so I basically assumed that to be a pre-condition for reactive behaviour. Thanks for the suggestion