warehouse
warehouse copied to clipboard
Support a Read Only Replica Database
Note:
This is effectively a no-op if you don't have DATABASE_REPLICA_URL defined other than the system "thinks" it's hitting the replica (but under the covers it's just re-using the same engine to the primary. This basically only is exposed as tags on the metrics.
Which means even without a replica, this would tell us how much of our traffic could be shunted over to a read only replica.
Should replicas be a list, or is the intent that the production replica is a proxy/round-robin DNS?
It could be a list. I left it as a single thing because we're not currently likely going to add multiple of them, and when we do there are various strategies for dispathcing between them that might make sense:
- Round Robin (requires storing state about what the "next" one is)
- Random (no state, but might not evenly distribute the load)
- "Named" replicas that are targeted for different routes (you could imagine splitting out one just for the simple API for instance).
Since we don't need it at the moment, and we don't really know what strategy would make the most sense, I just punted on doing it until later when we might actually need or want multiple replicas.
An interesting thing, https://github.com/pypi/warehouse/pull/1098 removed the read_only flag from a bunch of views that were otherwise read only because right now request.read_only includes a strict transaction isolation level.
It might be nicer to divorce the database selection targeting from the "read_only" flag so we can send queries to the replica without SERIALIZABLE isolation levels.
I'm going to rebase this PR on top of https://github.com/pypi/warehouse/pull/11989 which should improve our query performance more on it's own, and then simplifies the ability to opt in more read only views onto using the replica.
This now has the changes from #11989 rolled into it so until that lands this will be harder to review.
It adds a with_database predicate to allow selecting a specific database without incurring any additional affects.
This now supports a list of replicas, and when a list is given it will route to the replica that has the least open connections to it [^1].
[^1]: This only takes into account local connections, it's not taking a global look.
LGTM, can't approve it since it's my PR.