renku icon indicating copy to clipboard operation
renku copied to clipboard

Fix Redis Sentinel configuration

Open seanrmurphy opened this issue 2 years ago • 2 comments

Describe the bug

Currently, renku services do not use Redis Sentinel correctly. It is necessary to modify the renku helm chart to enable the sentinels to be specified, to propagate this to the appropriate services and for them to pick up and apply this configuration. This leads to issues where services which are dependent on Redis lose connection to Redis in case of Redis restart and can be inoperational.

Link to project N/A.

To Reproduce Terminate the Redis master pod and see some services having issues.

Expected behavior Services dependent on Redis should use the Sentinel mechanisms to become aware of a change in Redis master.

Screenshots and/or execution output N/A.

Run environment (please complete the following information): N/A.

Additional context Have discussed this with @Panaetius and thought it best to include an issue here. There are related issues for the different renku components:

  • https://github.com/SwissDataScienceCenter/renku-python/issues/3204
  • https://github.com/SwissDataScienceCenter/renku-gateway/issues/605
  • https://github.com/SwissDataScienceCenter/renku-ui/issues/2140

Proposal:

  • Add the field global.redis.sentinel.sentinelList to the values.yaml
  • This can contain a string of URIs as follows:
sentinelList: "redis-sentinel://renku-redis-node-0.renku-redis-headless:26379,redis-sentinel://renku-redis-node-1.renku-redis-headless:26379,redis-sentinel://renku-redis-node-2.renku-redis-headless:26379"
  • Notes:
    • the above sentinelList does not provide for passwords - it's not clear to me that there is any point/benefit in adding a password to the sentinel in our configuration (currently the sentinel is password protected).
    • redis-sentinel:// does not seem to be IANA recognized; however, it is clear and it is used by the Java lettuce library
    • using a string in this manner allows us to increase the number of sentinels quite easily if necessary; it seems a more natural solution than distributing lists of hosts and ports even if there is somehow a cost of ensuring no error in the string

Logic can be added to the individual services as follows:

  • if the sentinel list is provided, use this, otherwise operate as before
    • this will mean no breaking changes
  • we can revisit this subsequently and decide if we want to change the behaviour after this migration has been performed.

seanrmurphy avatar Nov 15 '22 10:11 seanrmurphy