readyset icon indicating copy to clipboard operation
readyset copied to clipboard

Implement GTID based replication

Open altmannmarcelo opened this issue 10 months ago • 4 comments

Description

When available we should be using GTID instead of Binary log file and position as it will make failover easier. A few things to consider:

  • Since we have per table binlog position and we have selective table replication, we need to ensure that we don't create GTID gaps for events that we skip.
  • MySQL added support for GTID Tags, so the layout might look uuid:seq_id or uuid:tag:seq_id - we should be able to store both
  • MariaDB layout is also different, we should check if we can easily implement this and comply with both formats
  • Once this is done, we should register as a replica using auto_position=1 (GTID)

https://dev.mysql.com/doc/refman/8.4/en/replication-gtids-concepts.html

https://mariadb.com/kb/en/gtid/

Change in user-visible behavior

Requires documentation change

altmannmarcelo avatar Feb 01 '25 13:02 altmannmarcelo

A couple of notes:

  • MySQL added the GTID tags in 8.4. afaict, They do not appear to carry important information about the event itself, it's more of an additional metadata field. It's currently unclear to me if that tag information would be required for correctness when restarting a binlog dump. Further, mysql-common::binlog::GtidEvent does not parse this field yet (as of v0.35.1). I believe supporting GTID tags can be deferred as 8.4 was only released recently (Apr 2024 - which is recent wrt databases).
  • wrt MariaDB, here's what i found (thanks, ChatGPT)
    • there's no distinct GtidEventmessage; they just bake that information into the event header.
    • The GTID format is slightly different: domain_id-server_id-sequence_number, there's a higher-level domain concept.
    • At replication start time, mariadb uses the standard COM_BINLOG_DUMP (with GTID), vs mysql's new COM_BINLOG_DUMP_GTID

The differences with mariadb don't seem to be a dramatic deal breaker in order to reuse the same readyset MySqlBinlogConnector. I don't know how well mysql-common deals with mariadb, in general, and i'm not sure how well it handles the mariadb replication. We can probably defer support for mariadb from the initial GTID support for mysql, and it might not be a heavy lift to support maria in the future.

jasobrown-rs avatar Apr 16 '25 21:04 jasobrown-rs

I'm going to put this effort on pause for (at least?) a few weeks, but capturing some thoughts/braindump here (this might just be notes to self )

  • The existing ReplicationOffset is an enum with two variants: MySql and Postgres - one for each database type we currently support. Each of those variants has a single field that is a struct (MySqlPosition, PostgresPosition) which contains the proper coordinates for binlog/wal replication. My initial inclination was to alter MySqlPosition from a struct to an enum, with two variants: one representing the standard binlog offset (what MySqlPosition current holds), and a new one for GTIDs. To make a very long story short, I ran into a lot of problems around serialization/deserialization due to 1) switching from a struct to an enum (basically, a level of indirection), and 2) the fact that we have 3 different serde algos that operate over ReplicationOffset: bincode (used in readyset-client <-> controller requests), rmp-serde (used to store the DfState, which contains a ReplicationOffset for the schema as a whole, into the Authority standalone mode's rocksdb), and serde_json (used for base table storage into rocksdb). It might be possible to craft some deserialization to make this work (I tried, for two days), I think the most reasonable solution is to just add a new enum variant to the high-level ReplicationOffset. Perhaps not as aesthetically clean, but everything will work.
  • After the ReplicationOffset, we only need to update the replicatior components for mysql: MySqlReplicator (for snapshotting) and MySqlBinlogConnector (for following the binlog updates). This is mostly straight-forward.
  • The real trick is grokking how the GTID sets work: the trick is how to determine if you've received all the updates you are suppossed to have. We only receive GTID_EVENT from the binlog stream, but we need to map that against what he have already received. And also on startup we need to see if we've fallen too far behind on replication, and if we should re-snapshot. I did not get too far into this, but the key will be to make sure of the git_executed and gtid_purged global variables. Check the mysql docs, especially this page.

I have some branches locally on my disk with some of the generative work I've done so far.

jasobrown-rs avatar Apr 24 '25 18:04 jasobrown-rs