Full high-availability (Redis Cluster/Sentinel support)
Description:
Currently (as of v1.102.0) Synapse supports horizontal scaling capability via workers. As I understand, the current worker capabilities result in somewhat complete request processing independence from the main worker process (for which we cannot run multiple processes), so at first glance, Synapse is highly-available.
However, using workers require the use of Redis. Synapse (again, as of v1.102.0) only supports a single Redis hostname and port. It does not support Redis Sentinel, which would handle Redis master (write-capable) election and redirection.
I've found a PR in the old matrix-org repo for adding Redis Sentinel support, but I have no ability to maintain it - so adding an issue ticket here instead.
Additionally, I think it might be worthwhile to add relevant guidelines in the Synapse documentation for information on how to achieve a highly-available setup. The 2020 post about scaling Synapse seems to indicate multiple Redis instances in their diagrams, but I can't seem to figure out how to achieve this with Synapse, as supporting Redis Cluster requires the redis client to be cluster-aware.
Related issues:
- https://github.com/element-hq/synapse/issues/15478
- https://github.com/element-hq/synapse/issues/13126
- https://github.com/element-hq/synapse/issues/15122 (this is the PR mentioned below converted to an issue (?)) Related PRs:
- https://github.com/matrix-org/synapse/pull/15122
Good you've opened this issue @Stogas! I've been comparing solutions. Element/Matrix checks all boxes, but one: High Availability. Single Point of Failure is not an option: "one == none". Clients should continue to work if a server goes down, without a client/app being interrupted. HA is a key feature to making the infrastructure secure (availability) and experience robust. Looking forward to see progress in this area. Keep up the good work!
Regarding the Redis Cluster/Sentinel support. Is redis still the right database to use in the long term or is there a move to the fork of the linux foundation (Valkey)?
This would be a very useful feature!
Coming here from https://github.com/matrix-org/synapse/issues/7076 in 2020.
Why wasn't the Master/Slave-Thingy not commenced?
Instead of having one redis-server there would have been at least some redundancy.
As a self-hoster of matrix, I actually would love to have a "hot-spare" that can be run on a different network. Since internet outage is quite feasible. From what I imagine the easiest would be to enter main and backup homeserver url, but not sure if that fits the architecture of matrix/synapse.
It turns out that there is currently no way to do horizontal scaling if you set up two nodes?