rmq icon indicating copy to clipboard operation
rmq copied to clipboard

There is a race between RMQ shutting itself down after heartbeat failure and other operations.

Open fmstephe opened this issue 3 years ago • 0 comments

Currently when RMQ has established a connection it will send a heartbeat, which sets a value with a 1 minute TTL, every second. This tells the queue cleaner that this connection is still live and should not be removed and cleaned up.

https://github.com/adjust/rmq/blob/master/connection.go#L110

If the heartbeat operation fails 45 times in a row, i.e. the redis is unreachable for 45 seconds, then the connection stops itself and shuts down all existing consumers.

https://github.com/adjust/rmq/blob/master/connection.go#L136

If the connection has shut itself down due to heartbeat failure calls to either *redisConnection.OpenQueue() or *redisQueue.StartConsuming() will attempt to write data into redis related to the connection which has already shut itself down.

The queue data that is written to redis in this case will contain key values for a connection which no longer exists

This race is unlikely under most usage scenarios, because it requires redis to be

  1. available to establish the connection
  2. unavailable for more than 45 seconds
  3. available again when we try to create a new queue or start consuming on one

However, if the client system expects redises to be unavailable but does not want their service to shutdown during this period, any persistent retry mechanism they implement makes this scenario more likely to occur.

fmstephe avatar May 20 '21 15:05 fmstephe