musicbrainz-server
musicbrainz-server copied to clipboard
Add a read-only database connector
Problem
Originally I was working on MBS-13337, but I'm not sure if focusing on background tasks is the best route, so am trying something different. If this looks reasonable, we can update that ticket or open a new one.
The more general problem is that we aren't making use of our standby enough for read-only queries.
Solution
This adds an ro_connector
to the server context which may be used to distribute read-only queries to our standby.
- It requires a
READONLY
database configured. In production,READONLY
would probably point to thepgbouncer-any
service, which connects to either the primary or standby. -
SET TRANSACTION READ ONLY
is always set on thero_connector
(which is desirable in case it connects to the primary). - A
USE_READONLY_DATABASE
DBDefs flag is required to usero_connector
, becauseREADONLY
may not be configured. (Perhaps we can just check if it's configured and not equal toREADWRITE
.)
We can't simply go through the codebase and find and replace $c->sql
with something like $c->ro_sql
. Rather, we need something like $c->prefer_ro_sql
, which is what I've added here. The reason is that many read-only queries are executed within a writing transaction, and must use the existing READWRITE
connection.
As a test, I've modified query_to_list
and query_to_list_limited
to use $c->prefer_ro_sql
.
Reports are currently efficient in that they are read and written in a single query (via something like SELECT INTO report_table FROM ( report_query )
). Running report_query
in a separate connection would mean reading the results in chunks, through Perl, and inserting them (in chunks) through the writable connection.
- This would likely reduce load on the primary database server, but also increase the time it takes to run the reports overall.
- Unless we add another
direct_ro_connector
(remember thatREADONLY
will likely point topgbouncer-any
), thereport_query
may go to the primary anyway, which makes the outcome worse for no benefit. - Distributing more read-only queries throughout the codebase (using
Data::Role::QueryToList
as a starting example) is still desirable and will reduce the impact of background queries (as there will be less load on the primary server overall).
Testing
I have not tested this with a separate READONLY
connector yet.
The idea seems sensible to me in theory at least - let's see what @yvanzo thinks. Perl tests did not run here for some reason, and we should probably have some actual written tests that use two separate connectors?
I'm working on splitting the commits and improving the commit messages, plus will hopefully have a bit of time to add a test.
Should there be a separate ticket for querying the read-only database for website with logged-out users and for web service API? Or just generalizing the current ticket to something like “Use standby Postgres instances for asynchronous read-only queries”?
I updated the existing ticket, but mentioned that it will only be used for Data::Role::QueryToList
and some background tasks to start with, if that sounds okay. (We may open additional tickets later to use the connector in other parts of the code.)