amqp icon indicating copy to clipboard operation
amqp copied to clipboard

`closeConnection` can swallow IO exceptions and block / hang indefinitely

Open liminalisht opened this issue 3 years ago • 5 comments

https://github.com/hreinhardt/amqp/blob/3ddd97222881a2d2af9e2d23bdcfd63f507c0b09/Network/AMQP/Internal.hs#L418-L431

closeConnection swallows any IO exceptions it might encounter upon sending the close connection message to the server, and then waits on the server indefinitely, expecting the server to send back a connection_closed_ok message.

One problem I see is that if an IO exception is encountered when sending the server the close connection message, there is no reason to expect that the server has properly received the message, and therefore no reason to expect that the server will respond with a connection_closed_ok message.

In such a case, the MVar contents will never become full, and the call to readMVar will block forever. From the readMVar docs:

Atomically read the contents of an MVar. If the MVar is currently empty, readMVar will wait until it is full.

https://www.stackage.org/haddock/lts-18.22/base-4.14.3.0/Control-Concurrent-MVar.html#v:readMVar

I would propose that closeConnection at a minimum be altered such that readMVar is not called if an IO exception is encountered.

liminalisht avatar Jan 20 '22 17:01 liminalisht

Is this a theoretical concern or can you actually trigger this as a bug?

I'm asking because I believe the code is correct, based on these two assumptions:

  • The only way that writeFrame could fail is if the connection is closed.
  • When the connection is closed (explicitly or by some connection error), we always fill connClosedLock: https://github.com/hreinhardt/amqp/blob/3ddd97222881a2d2af9e2d23bdcfd63f507c0b09/Network/AMQP/Internal.hs#L304

In addition, I think most users will have heartbeats enabled, so that would be another safeguard to prevent infinite blocking.

hreinhardt avatar Jan 20 '22 18:01 hreinhardt

I have in fact experienced closeConnection hanging (and have temporarily gotten around it by invoking it using timeout.)

However, to be clear, I do not know why closeConnection hangs. The explanation I proposed above is indeed (to your point) theoretical. I proposed it because I could not think of another reason for the function hanging.

Given the code you pointed me to, I am now quite perplexed. It looks like the lock should definitely be filled whenever the connection is closed.

That leaves me wondering whether there's really no other way for writeFrame to fail (other than the connection being closed).

liminalisht avatar Jan 25 '22 15:01 liminalisht

After thinking about it some more, you might be right. If writeFrame fails without closing the connection, the code might deadlock. I have no idea if that could ever happen, but it's probably better to be resistant to that.

I've pushed a fix: https://github.com/hreinhardt/amqp/commit/ff169bbb8eec2d62efb4819f56f2c2d679ff4fc5

Does the change look good to you?

hreinhardt avatar Jan 25 '22 21:01 hreinhardt

Yeah, looks good to me. Thanks for the quick response / work!

liminalisht avatar Jan 25 '22 22:01 liminalisht

Ok, I've pushed a new version to Hackage. Curious to see if the blocking issue persists for you.

hreinhardt avatar Jan 26 '22 17:01 hreinhardt