gh-ost
gh-ost copied to clipboard
Documentation - switching to row based replication and data loss
If you are running gh-ost on master (ie --allow-master-master, --allow-on-master) with anything other than row based binary logs the switch to row based needs to be handled carefully (more carefully than currently documented)
We found that many of our connections to mariadb are very long lasting (we recycle the connections a lot) - after a great deal of experimentation after some significant data loss it was discovered that the connections that were made prior to the switch to row based binary logging retained the mixed setting until closed. Transactions that took place on these connections are not shuttled into the building _gho table
These binary logs entries are silently ignored by gh-ost
While the help message indicates that this should run on replica
, gh-ost
does not actually prevent you from switching on primary. Solution: the feature must only work on a replica and fail (exit with error) if on primary.
Addressed downstream by https://github.com/openark/gh-ost/pull/23
Hi
Just wanted to clarify - this is not related to --switch-to-rbr running on a master
We normally run MIXED mode in master-master and have always switched to row based prior to running gh-ost using :
set global binlog_format='ROW'
The point I was trying to make is that mysql / mariadb DO NOT immediately switch open connections based in the command above - any long running connections that are still open and doing inserts etc after the above command has been issued still run in the binary logging mode they were initiated with (in our case MIXED). This leads to gh-ost ignoring any transactions that run on those connections subsequently leading to data loss
This is not a criticism of gh-ost (it's not really even a criticism of MariaDB) - it's just worth mentioning in the documentation I think (as a warning) as it's a source of data loss that was not easy to track down/not obvious
Hope that helps clarify
Got it. Thank you for clarifying! You are correct and this is a behavior of mysql&mariadb, and I'm open to suggestions on how to document this best.
Ok, maybe something like the following :
In [https://github.com/github/gh-ost/blob/master/doc/requirements-and-limitations.md]
Due to MySQL / MariaDB global setting changes only applying to new connections you should ensure that when switching to ROW based binary logging that all connections are re-established prior to running gh-ost operations - failure to do so may result in gh-ost being unable to interpret transactions/data loss : reference [https://mariadb.com/kb/en/set/#global-session] - this is particularly a problem with -allow-on-master (using slave binary logs does not suffer with this as slave replication threads are very short lived)
@shlomi-noach / @marnixgb with the right tweaks I think the performance_schema
could provided an indicator that this is the case
I think it will need some tweaking but this shows me all MySQL sessions with binlog_row_image=MINIMAL
on 5.7:
SELECT threads.processlist_user, vars.variable_value FROM performance_schema.threads
LEFT JOIN performance_schema.variables_by_thread vars ON threads.thread_id=vars.thread_id
WHERE threads.processlist_command IN ('Query','Sleep')
AND vars.variable_name='binlog_row_image' AND vars.variable_value='MINIMAL';
Perhaps a warning or error could be generated with this data 🤔?
I do like where you are going with that - but we don't operate performance_schema unfortunately (maybe we should) - a lot of people don't
In the rawest form you could gather the full processlist "Time" column - if any connections time implies connections prior to the running of gh-ost there is a risk that the connection could update in the wrong format (when gh-ost has initiated the switch to RBR of course)
But that's all still a bit fuzzy - probably enough to throw a warning though