py-amqp icon indicating copy to clipboard operation
py-amqp copied to clipboard

SSL protocol is shutdown

Open yuvadm opened this issue 4 years ago • 9 comments

I'm running a celery worker which uses a RabbitMQ broker via an SSL connection. From time to time we're seeing an error in our logs

SSLError(1, '[SSL: PROTOCOL_IS_SHUTDOWN] protocol is shutdown (_ssl.c:2162)')

Which is coming from this line:

https://github.com/celery/py-amqp/blob/master/amqp/transport.py#L430

My best understanding is that amqp is attempting to write to a socket that has previously been closed, is it possible to know which side terminate the socket? Can there be any better error handling in this case? Or any possibility to recover from the error?

yuvadm avatar Jun 25 '20 13:06 yuvadm

Can you please attach the RabbitMQ logs from the time when the exception happened?

thedrow avatar Jun 28 '20 11:06 thedrow

Here are some RabbitMQ logs surrounding the exact time we got an SSLError:

2020-06-28 13:04:29.152 [warning] <0.37.186> closing AMQP connection <0.37.186> (10.154.0.40:44566 -> 10.54.3.7:5671, vhost: 'our-pod-name', user: 'our-pod-name'):
client unexpectedly closed TCP connection
2020-06-28 13:04:29.188 [warning] <0.742.186> closing AMQP connection <0.742.186> (10.154.0.42:44640 -> 10.54.3.7:5671, vhost: 'our-pod-name', user: 'our-pod-name'):
client unexpectedly closed TCP connection

####### SSLError came up on Celery logs at 13:04:45 #########

2020-06-28 13:05:32.387 [info] <0.10410.186> accepting AMQP connection <0.10410.186> (10.154.0.87:43848 -> 10.54.3.7:5671)
2020-06-28 13:05:32.407 [info] <0.10410.186> connection <0.10410.186> (10.154.0.87:43848 -> 10.54.3.7:5671): user 'our-pod-name' authenticated and granted access to vhost 'our-pod-name'
2020-06-28 13:05:35.533 [info] <0.10423.186> accepting AMQP connection <0.10423.186> (10.154.0.87:43850 -> 10.54.3.7:5671)
2020-06-28 13:05:35.552 [info] <0.10423.186> connection <0.10423.186> (10.154.0.87:43850 -> 10.54.3.7:5671): user 'our-pod-name' authenticated and granted access to vhost 'our-pod-name'
2020-06-28 13:05:35.660 [info] <0.10456.186> accepting AMQP connection <0.10456.186> (10.154.0.40:43852 -> 10.54.3.7:5671)
2020-06-28 13:05:35.680 [info] <0.10456.186> connection <0.10456.186> (10.154.0.40:43852 -> 10.54.3.7:5671): user 'our-pod-name' authenticated and granted access to vhost 'our-pod-name'
2020-06-28 13:05:35.771 [info] <0.10465.186> accepting AMQP connection <0.10465.186> (10.154.0.56:58578 -> 10.54.3.7:5671)

yuvadm avatar Jun 28 '20 13:06 yuvadm

Some more details on our setup, the Celery workers are connecting to Rabbit on a separate k8s cluster, with a NAT instance sitting in the middle.

Our current working theory is that the NAT instance is doing something to some of the connections, and Celery/Kombu/pyamqp seems to be angry about it.

We'll continue to debug the exact root cause, but curious if we can get the Celery stack to be more graceful about these dropped connections.

yuvadm avatar Jun 28 '20 14:06 yuvadm

Since I don't really know what this error is and I can't find a description of it anywhere, I'm not sure how to help.

thedrow avatar Jun 29 '20 08:06 thedrow

This kind of error is raised when the SSL wrapper tries to write to a closed TCP connection. It is usually related to connection problems, firewalls or SSL interception.

I asked on StackOverflow. That's the answer I got.

thedrow avatar Jul 01 '20 07:07 thedrow

@thedrow Thanks for helping us debug this error!

We're still trying to debug where our connections are being dropped and fix the root cause.

Eventually it would be nice to fix this behavior to handle drops more gracefully, and I'd be happy to help propose a patch for that.

yuvadm avatar Jul 01 '20 08:07 yuvadm

@yuvadm did you find something ?

Kristin0 avatar Feb 12 '22 20:02 Kristin0

@Kristin0 Didn't really continue working on this issue, sorry

yuvadm avatar Feb 13 '22 05:02 yuvadm

@Kristin0 If you have this problem. Please share all the details as well.

thedrow avatar Feb 17 '22 12:02 thedrow