readyset
readyset copied to clipboard
Implement GTID based replication
Description
When available we should be using GTID instead of Binary log file and position as it will make failover easier. A few things to consider:
- Since we have per table binlog position and we have selective table replication, we need to ensure that we don't create GTID gaps for events that we skip.
- MySQL added support for GTID Tags, so the layout might look uuid:seq_id or uuid:tag:seq_id - we should be able to store both
- MariaDB layout is also different, we should check if we can easily implement this and comply with both formats
- Once this is done, we should register as a replica using auto_position=1 (GTID)
https://dev.mysql.com/doc/refman/8.4/en/replication-gtids-concepts.html
https://mariadb.com/kb/en/gtid/
Change in user-visible behavior
Requires documentation change
A couple of notes:
- MySQL added the GTID tags in 8.4. afaict, They do not appear to carry important information about the event itself, it's more of an additional metadata field. It's currently unclear to me if that tag information would be required for correctness when restarting a binlog dump. Further,
mysql-common::binlog::GtidEventdoes not parse this field yet (as of v0.35.1). I believe supporting GTID tags can be deferred as 8.4 was only released recently (Apr 2024 - which is recent wrt databases). - wrt MariaDB, here's what i found (thanks, ChatGPT)
- there's no distinct
GtidEventmessage; they just bake that information into the event header. - The GTID format is slightly different:
domain_id-server_id-sequence_number, there's a higher-level domain concept. - At replication start time, mariadb uses the standard
COM_BINLOG_DUMP(with GTID), vs mysql's newCOM_BINLOG_DUMP_GTID
- there's no distinct
The differences with mariadb don't seem to be a dramatic deal breaker in order to reuse the same readyset MySqlBinlogConnector. I don't know how well mysql-common deals with mariadb, in general, and i'm not sure how well it handles the mariadb replication. We can probably defer support for mariadb from the initial GTID support for mysql, and it might not be a heavy lift to support maria in the future.
I'm going to put this effort on pause for (at least?) a few weeks, but capturing some thoughts/braindump here (this might just be notes to self
- The existing
ReplicationOffsetis an enum with two variants:MySqlandPostgres- one for each database type we currently support. Each of those variants has a single field that is a struct (MySqlPosition,PostgresPosition) which contains the proper coordinates for binlog/wal replication. My initial inclination was to alterMySqlPositionfrom a struct to an enum, with two variants: one representing the standard binlog offset (whatMySqlPositioncurrent holds), and a new one for GTIDs. To make a very long story short, I ran into a lot of problems around serialization/deserialization due to 1) switching from a struct to an enum (basically, a level of indirection), and 2) the fact that we have 3 different serde algos that operate overReplicationOffset:bincode(used inreadyset-client<-> controller requests),rmp-serde(used to store theDfState, which contains aReplicationOffsetfor the schema as a whole, into the Authority standalone mode's rocksdb), andserde_json(used for base table storage into rocksdb). It might be possible to craft some deserialization to make this work (I tried, for two days), I think the most reasonable solution is to just add a new enum variant to the high-levelReplicationOffset. Perhaps not as aesthetically clean, but everything will work. - After the
ReplicationOffset, we only need to update the replicatior components for mysql:MySqlReplicator(for snapshotting) andMySqlBinlogConnector(for following the binlog updates). This is mostly straight-forward. - The real trick is grokking how the GTID sets work: the trick is how to determine if you've received all the updates you are suppossed to have. We only receive
GTID_EVENTfrom the binlog stream, but we need to map that against what he have already received. And also on startup we need to see if we've fallen too far behind on replication, and if we should re-snapshot. I did not get too far into this, but the key will be to make sure of thegit_executedandgtid_purgedglobal variables. Check the mysql docs, especially this page.
I have some branches locally on my disk with some of the generative work I've done so far.
random (perhaps useless) links I had opened on purging (I was trying to borrow notes/ideas from debezium):