gh-ost icon indicating copy to clipboard operation
gh-ost copied to clipboard

gh-ost does not copy data on tables with high write load

Open Shukla-Ankur opened this issue 2 years ago • 4 comments

MySql v8.0.23

I created a table using sysbench with ~68M rows. To simulate a production load, I am running following sysbench script to generate continuous load on machine

sysbench --db-driver=mysql --mysql-user=xxxxx --mysql_password=xxxxx --mysql-db=sbtest --mysql-host=xxxxx --mysql-port=3306 --tables=1 --threads=20 --time=0 --events=0 --rate=1 --report-interval=10 --rate= /usr/share/sysbench/oltp_read_write.lua run

When there is continuous data generated like above, gh-ost is unable to migrate by copying existing data and is stuck on handling the newer writes only.

Copy: 0/67930960 0.0%; Applied: 49272370; Backlog: 1000/1000; Time: 15h24m30s(total), 15h24m30s(copy); streamer: binlog.000179:773080899; Lag: 0.22s, HeartbeatLag: 46161.22s, State: migrating; ETA: N/A
Copy: 0/67930960 0.0%; Applied: 49298550; Backlog: 1000/1000; Time: 15h25m0s(total), 15h25m0s(copy); streamer: binlog.000179:786810019; Lag: 0.02s, HeartbeatLag: 46187.02s, State: migrating; ETA: N/A

What exactly is causing this? Is there a specific config I need to use allow copy?

Shukla-Ankur avatar Jul 14 '22 04:07 Shukla-Ankur

:wave: @Shukla-Ankur

It sounds like one of two things is happening, either:

  1. gh-ost prioritises processing binlog changes over copying rows from the existing table, so if you're saturating your database to the point that there's never a break in ongoing transactions then gh-ost will never get a chance to copy the existing data - try reducing the workload you're putting through your database

or:

  1. HeartbeatLag is very high, so check the gh-ost logs for any errors that indicate that gh-ost is unable to generate the heartbeats

dm-2 avatar Sep 06 '22 17:09 dm-2

  1. Since you have suggested reducing workload, my guess is that there is not an option to prioritize copying existing data over binlog applier.

Shukla-Ankur avatar Oct 21 '22 05:10 Shukla-Ankur

@Shukla-Ankur unfortunately not - the reason that binlog changes are prioritised is that gh-ost cannot fall too far behind in processing the binlog events: there is a risk that if gh-ost doesn't prioritise binlog events, it may fall behind and not be able to catch up with the volume of changes on high workload clusters.

This is also because binlog events are ephemeral, and won't be available on the upstream host it is replicating from indefinitely. The table copy, on the other hand, is reading data that has already been persisted so there is little risk of falling behind - a large table copy will simply take more time.

We have some very high throughput clusters here at GitHub (we are mostly constrained by replication throughput/lag), but we don't have issues with gh-ost never being able to copy the source table. Synthetic workloads like sysbench are good for testing the limits of systems, but aren't representative of real workloads 😄

dm-2 avatar Oct 21 '22 16:10 dm-2