musicbrainz-server icon indicating copy to clipboard operation
musicbrainz-server copied to clipboard

Add a read-only database connector

Open mwiencek opened this issue 10 months ago • 1 comments

Problem

Originally I was working on MBS-13337, but I'm not sure if focusing on background tasks is the best route, so am trying something different. If this looks reasonable, we can update that ticket or open a new one.

The more general problem is that we aren't making use of our standby enough for read-only queries.

Solution

This adds an ro_connector to the server context which may be used to distribute read-only queries to our standby.

  • It requires a READONLY database configured. In production, READONLY would probably point to the pgbouncer-any service, which connects to either the primary or standby.
  • SET TRANSACTION READ ONLY is always set on the ro_connector (which is desirable in case it connects to the primary).
  • A USE_READONLY_DATABASE DBDefs flag is required to use ro_connector, because READONLY may not be configured. (Perhaps we can just check if it's configured and not equal to READWRITE.)

We can't simply go through the codebase and find and replace $c->sql with something like $c->ro_sql. Rather, we need something like $c->prefer_ro_sql, which is what I've added here. The reason is that many read-only queries are executed within a writing transaction, and must use the existing READWRITE connection.

As a test, I've modified query_to_list and query_to_list_limited to use $c->prefer_ro_sql.

Reports are currently efficient in that they are read and written in a single query (via something like SELECT INTO report_table FROM ( report_query )). Running report_query in a separate connection would mean reading the results in chunks, through Perl, and inserting them (in chunks) through the writable connection.

  • This would likely reduce load on the primary database server, but also increase the time it takes to run the reports overall.
  • Unless we add another direct_ro_connector (remember that READONLY will likely point to pgbouncer-any), the report_query may go to the primary anyway, which makes the outcome worse for no benefit.
  • Distributing more read-only queries throughout the codebase (using Data::Role::QueryToList as a starting example) is still desirable and will reduce the impact of background queries (as there will be less load on the primary server overall).

Testing

I have not tested this with a separate READONLY connector yet.

mwiencek avatar Apr 21 '24 22:04 mwiencek

The idea seems sensible to me in theory at least - let's see what @yvanzo thinks. Perl tests did not run here for some reason, and we should probably have some actual written tests that use two separate connectors?

reosarevok avatar Apr 22 '24 08:04 reosarevok

I'm working on splitting the commits and improving the commit messages, plus will hopefully have a bit of time to add a test.

mwiencek avatar May 07 '24 13:05 mwiencek

Should there be a separate ticket for querying the read-only database for website with logged-out users and for web service API? Or just generalizing the current ticket to something like “Use standby Postgres instances for asynchronous read-only queries”?

I updated the existing ticket, but mentioned that it will only be used for Data::Role::QueryToList and some background tasks to start with, if that sounds okay. (We may open additional tickets later to use the connector in other parts of the code.)

mwiencek avatar May 21 '24 14:05 mwiencek