memcached-session-manager icon indicating copy to clipboard operation
memcached-session-manager copied to clipboard

Better handle Redis failure

Open msellinger opened this issue 8 years ago • 3 comments

This patch handles failure of Redis operations in a better way and is aimed to especially allow failure of an Amazon Elasticache node to trigger immediate logging and reconnection of all Redis connection MSM currently uses. The current code has multiple deficiencies in this respect, especially

(a) it wouldn't detect the failure in all cases since not all failures lead to a JedisConnectionException (the new code catches Exception instead of JedisConnectionException)

(b) It reconnects all Redis connections even when a single failure occurs to make sure when the failure is detected no old defunct connections are laying around (this might lead to hard-to-track down errors at later)

(c) It does proper INFO / ERROR logging to allow debugging the problem later in logs instead of just doing DEBUG logging which is likely switched off on a live system

Note that this patch assumes that every Redis error is an indication that the node failed, which however should be safe to assume in production systems.

msellinger avatar Jan 15 '18 15:01 msellinger

@msellinger what's the status here, are you waiting for me merging this or are you going to change anything?

magro avatar Mar 14 '18 22:03 magro

@magro This is to inform you that we are currently testing this patch (based on the current MSM version), we will also switch to Tomcat 8.5 in the process. Once everything is tested, I will report back, then I will apply again for including this in mainline.

msellinger avatar May 03 '19 11:05 msellinger

We have the current master plus this pull request running in production for several weeks now and did not notice any regressions. From my side, this pull request is ready to be merged.

msellinger avatar Jun 07 '19 10:06 msellinger