py-amqp
py-amqp copied to clipboard
SSL protocol is shutdown
I'm running a celery worker which uses a RabbitMQ broker via an SSL connection. From time to time we're seeing an error in our logs
SSLError(1, '[SSL: PROTOCOL_IS_SHUTDOWN] protocol is shutdown (_ssl.c:2162)')
Which is coming from this line:
https://github.com/celery/py-amqp/blob/master/amqp/transport.py#L430
My best understanding is that amqp
is attempting to write to a socket that has previously been closed, is it possible to know which side terminate the socket? Can there be any better error handling in this case? Or any possibility to recover from the error?
Can you please attach the RabbitMQ logs from the time when the exception happened?
Here are some RabbitMQ logs surrounding the exact time we got an SSLError
:
2020-06-28 13:04:29.152 [warning] <0.37.186> closing AMQP connection <0.37.186> (10.154.0.40:44566 -> 10.54.3.7:5671, vhost: 'our-pod-name', user: 'our-pod-name'):
client unexpectedly closed TCP connection
2020-06-28 13:04:29.188 [warning] <0.742.186> closing AMQP connection <0.742.186> (10.154.0.42:44640 -> 10.54.3.7:5671, vhost: 'our-pod-name', user: 'our-pod-name'):
client unexpectedly closed TCP connection
####### SSLError came up on Celery logs at 13:04:45 #########
2020-06-28 13:05:32.387 [info] <0.10410.186> accepting AMQP connection <0.10410.186> (10.154.0.87:43848 -> 10.54.3.7:5671)
2020-06-28 13:05:32.407 [info] <0.10410.186> connection <0.10410.186> (10.154.0.87:43848 -> 10.54.3.7:5671): user 'our-pod-name' authenticated and granted access to vhost 'our-pod-name'
2020-06-28 13:05:35.533 [info] <0.10423.186> accepting AMQP connection <0.10423.186> (10.154.0.87:43850 -> 10.54.3.7:5671)
2020-06-28 13:05:35.552 [info] <0.10423.186> connection <0.10423.186> (10.154.0.87:43850 -> 10.54.3.7:5671): user 'our-pod-name' authenticated and granted access to vhost 'our-pod-name'
2020-06-28 13:05:35.660 [info] <0.10456.186> accepting AMQP connection <0.10456.186> (10.154.0.40:43852 -> 10.54.3.7:5671)
2020-06-28 13:05:35.680 [info] <0.10456.186> connection <0.10456.186> (10.154.0.40:43852 -> 10.54.3.7:5671): user 'our-pod-name' authenticated and granted access to vhost 'our-pod-name'
2020-06-28 13:05:35.771 [info] <0.10465.186> accepting AMQP connection <0.10465.186> (10.154.0.56:58578 -> 10.54.3.7:5671)
Some more details on our setup, the Celery workers are connecting to Rabbit on a separate k8s cluster, with a NAT instance sitting in the middle.
Our current working theory is that the NAT instance is doing something to some of the connections, and Celery/Kombu/pyamqp seems to be angry about it.
We'll continue to debug the exact root cause, but curious if we can get the Celery stack to be more graceful about these dropped connections.
Since I don't really know what this error is and I can't find a description of it anywhere, I'm not sure how to help.
This kind of error is raised when the SSL wrapper tries to write to a closed TCP connection. It is usually related to connection problems, firewalls or SSL interception.
I asked on StackOverflow. That's the answer I got.
@thedrow Thanks for helping us debug this error!
We're still trying to debug where our connections are being dropped and fix the root cause.
Eventually it would be nice to fix this behavior to handle drops more gracefully, and I'd be happy to help propose a patch for that.
@yuvadm did you find something ?
@Kristin0 Didn't really continue working on this issue, sorry
@Kristin0 If you have this problem. Please share all the details as well.