fr: add support for multi-az postgres with read-only replication
Is your feature request related to a problem? Please describe.
Yes, current implementation assumes all postgres data stores have static read-write configurations (i.e. will always be read-write for the lifecycle of the temporal instance).
Describe the solution you'd like
Add support for postgres configurations such that:
- Nodes may utilize dynamic read-only/read-write (HA) postgres servers between multiple instances.
- Where write transactions are required, fall back to available read-write (master) nodes to mitigate issues with HA compatibility.
- Default queries only utilize read-only nodes (optional but bonus points if so).
Describe alternatives you've considered
No alternatives, if HA is enabled between multiple postgres servers (e.g temporal does not check pg_is_in_recovery), temporal will eventually fail upon (postgres) instance recovery / failover.
Additional context
Postgres configured such that:
- 1 master / read-write instance
- 1 or more read-only instance(s)
- Nodes may failover from read-write to read-only (and vice versa) at any point.
Happy to provide docker configurations for reproducible context for the feature request.
Hi @joshua-auchincloss I'm evaluating the feasibility of this feature. This is a very good feature to have. But not sure about the complexity it introduces. Could you please provide the docker setup so that I can try it out?
Unfortunately the Temporal server application cannot clearly distinguish between purely read-only transactions and read-write transactions. So it's not possible for the server to route the queries appropriately.
Unfortunately the Temporal server application cannot clearly distinguish between purely read-only transactions and read-write transactions. So it's not possible for the server to route the queries appropriately.
Hey @gow, thanks for the response - apologies for the delayed response on docker configs, will get to this once there is time to do so.
Regarding distinguishing read-only vs read-write transactions, the client could distinguish at the minimum between writable database hosts at runtime and reroute all queries to that database. Per transaction level is a nice-to-have, but not required. IMO, priority would be the former where all queries are routed to the 'master' write clusters to avoid e.g errors against read-only instances. This approach would be a simple select pg_is_in_recovery()
Hi @joshua-auchincloss looks like the sql connections are all treated as read-write. There is no concept of read-only connections at the moment. A quick temporary fix for this problem could be that when we initiate a connection, we perform pg_is_in_recovery() check and drop that connection. However, this would not work in 2 cases
- PG goes into recovery mode after the connection is established.
- There are no other hosts available to connect and we end up not having any DB connections.