swindon Fix inactivity callback in clustered setup

Fix inactivity callback in clustered setup

Open tailhook opened this issue 8 years ago • 2 comments

Jul 04 '17 11:07 tailhook

Well, I don't understand how 2dfe91a fixes the issue.

Jul 05 '17 22:07 tailhook

The proposed strategy is:

Sync all inactivity timers across all the replicating nodes. Probably by grouping them in bulks with 100ms - 1s latency.
Split session namespace into buckets using consistent hashing. Assign 1/nth share of sessions for every node
Notify about inactivity callbacks sent using technique similar to (1)
Assign buckets to the next servers with the delay, i.e.:

buckets of server2 to server1 with the delay of 10 seconds
buckets of server3 to server2 with the delay of 10 seconds
buckets of server3 to server1 with the delay of 20 seconds, and so on

Cancel calling handler if other server reports it already sent

This means: if one of the servers fails or lags too much we will delay its messages by just 10 seconds, but all inactivity callbacks are sent anyway (though, in complex failure scenarios ones can be duplicated, that's fine). And also this doesn't introduce any complex failure detection and leader election algorithms.

@popravich ?

Jul 21 '17 13:07 tailhook

swindon swindon copied to clipboard

Fix inactivity callback in clustered setup

swindon
swindon copied to clipboard