amqp
amqp copied to clipboard
`closeConnection` can swallow IO exceptions and block / hang indefinitely
https://github.com/hreinhardt/amqp/blob/3ddd97222881a2d2af9e2d23bdcfd63f507c0b09/Network/AMQP/Internal.hs#L418-L431
closeConnection
swallows any IO exceptions it might encounter upon sending the close connection message to the server, and then waits on the server indefinitely, expecting the server to send back a connection_closed_ok message.
One problem I see is that if an IO exception is encountered when sending the server the close connection message, there is no reason to expect that the server has properly received the message, and therefore no reason to expect that the server will respond with a connection_closed_ok message.
In such a case, the MVar
contents will never become full, and the call to readMVar
will block forever. From the readMVar
docs:
Atomically read the contents of an MVar. If the MVar is currently empty, readMVar will wait until it is full.
https://www.stackage.org/haddock/lts-18.22/base-4.14.3.0/Control-Concurrent-MVar.html#v:readMVar
I would propose that closeConnection
at a minimum be altered such that readMVar
is not called if an IO exception is encountered.
Is this a theoretical concern or can you actually trigger this as a bug?
I'm asking because I believe the code is correct, based on these two assumptions:
- The only way that
writeFrame
could fail is if the connection is closed. - When the connection is closed (explicitly or by some connection error), we always fill connClosedLock: https://github.com/hreinhardt/amqp/blob/3ddd97222881a2d2af9e2d23bdcfd63f507c0b09/Network/AMQP/Internal.hs#L304
In addition, I think most users will have heartbeats enabled, so that would be another safeguard to prevent infinite blocking.
I have in fact experienced closeConnection
hanging (and have temporarily gotten around it by invoking it using timeout
.)
However, to be clear, I do not know why closeConnection
hangs. The explanation I proposed above is indeed (to your point) theoretical. I proposed it because I could not think of another reason for the function hanging.
Given the code you pointed me to, I am now quite perplexed. It looks like the lock should definitely be filled whenever the connection is closed.
That leaves me wondering whether there's really no other way for writeFrame
to fail (other than the connection being closed).
After thinking about it some more, you might be right. If writeFrame fails without closing the connection, the code might deadlock. I have no idea if that could ever happen, but it's probably better to be resistant to that.
I've pushed a fix: https://github.com/hreinhardt/amqp/commit/ff169bbb8eec2d62efb4819f56f2c2d679ff4fc5
Does the change look good to you?
Yeah, looks good to me. Thanks for the quick response / work!
Ok, I've pushed a new version to Hackage. Curious to see if the blocking issue persists for you.