server icon indicating copy to clipboard operation
server copied to clipboard

MDEV-36290: ALTER TABLE with multi-master can cause data loss

Open bnestere opened this issue 7 months ago • 1 comments
trafficstars

One can have data loss in multi-master setups when 1) both masters update the same table, 2) ATLER TABLE is run on one master which re-arranges the column ordering, and 3) transactions are binlogged in ROW binlog_format. This is because a slave identifies columns to update using column index number. That is, if a transaction updates a table with columns on the master, the binary log ROW event will store its data column-by-column, from the first column, to the th column, in-order. When the slave applies this row to its table, it simply updates each column of its new row using these same values, in the same order. If the slave’s table has its columns in a different order (from some ALTER TABLE, which can have added, removed, or re-arranged) these columns, the data will still be stored in the order that it was done on the master table. This leads to data loss.

This patch adds the ability for a slave to lookup column name when applying ROW events, so if the ordering of columns on the master and slave differs, the slave can still apply the data changes to the correct column. This is limited to when a master binlogs events using option binlog_row_metadata=FULL, as this extends Table_map_log_event metadata to contain column names. When this column name metadata is missing, the lookup behavior is unchanged (i.e. it will use column index, as before).

Patch is currently still being drafted. Extensive MTR testing is still TODO, as well as addressing various TODOs in the code. The current state of MTR tests for this patch is simply reflective of the JIRA description, so it is very minimal, to just show the issue generally.

bnestere avatar Apr 11 '25 21:04 bnestere