renku
renku copied to clipboard
Fix Redis Sentinel configuration
Describe the bug
Currently, renku
services do not use Redis Sentinel correctly. It is necessary to modify the renku
helm chart to enable the sentinels to be specified, to propagate this to the appropriate services and for them to pick up and apply this configuration. This leads to issues where services which are dependent on Redis lose connection to Redis in case of Redis restart and can be inoperational.
Link to project N/A.
To Reproduce Terminate the Redis master pod and see some services having issues.
Expected behavior Services dependent on Redis should use the Sentinel mechanisms to become aware of a change in Redis master.
Screenshots and/or execution output N/A.
Run environment (please complete the following information): N/A.
Additional context
Have discussed this with @Panaetius and thought it best to include an issue here. There are related issues for the different renku
components:
- https://github.com/SwissDataScienceCenter/renku-python/issues/3204
- https://github.com/SwissDataScienceCenter/renku-gateway/issues/605
- https://github.com/SwissDataScienceCenter/renku-ui/issues/2140
Proposal:
- Add the field
global.redis.sentinel.sentinelList
to thevalues.yaml
- This can contain a string of URIs as follows:
sentinelList: "redis-sentinel://renku-redis-node-0.renku-redis-headless:26379,redis-sentinel://renku-redis-node-1.renku-redis-headless:26379,redis-sentinel://renku-redis-node-2.renku-redis-headless:26379"
- Notes:
- the above
sentinelList
does not provide for passwords - it's not clear to me that there is any point/benefit in adding a password to the sentinel in our configuration (currently the sentinel is password protected). -
redis-sentinel://
does not seem to be IANA recognized; however, it is clear and it is used by the Java lettuce library - using a string in this manner allows us to increase the number of sentinels quite easily if necessary; it seems a more natural solution than distributing lists of hosts and ports even if there is somehow a cost of ensuring no error in the string
- the above
Logic can be added to the individual services as follows:
- if the sentinel list is provided, use this, otherwise operate as before
- this will mean no breaking changes
- we can revisit this subsequently and decide if we want to change the behaviour after this migration has been performed.