Performance Improvements (Raw Ideas)
We are looking to improve the performance of gh-ost so that we can safely operate with larger database sizes. Our use case is a bit different to GitHub’s in that we do not use read-replicas. Some of the DBs have very cyclic usage as well (i.e. only busy 9-5 M-F), and may have windows of free capacity.
I have a few ideas I wanted to run by you since I’m sure some have come up prior:
| Feature | Context | Status |
|---|---|---|
| Use 8.0 ALGORITHM=INSTANT when applicable | MySQL 8.0 does not require gh-ost in some cases. | Merged, thank you kindly for the review! |
| Dynamic Chunk Size | Gh-ost can observe the exec time of processing chunks, and dynamically increase the size if it fits below a threshold. For our environment (because we have a lot of replica tolerance) we typically run larger batch sizes, but have varying DB instance sizes. Being able to have this auto-tune is a win for us. | See PR here (and issue comment) |
| Multi-threaded applier | Parallel replication apply is much better in MySQL 8.0 – so combined with that we don’t use read-replicas, we can probably push more changes through the binlog than gh-ost currently does. We think we can tolerate a few minutes of replica lag. Our limit is Aurora restricts the relay log to ~1000M, if we exceed that.. we reduce our DR capabilities. (Note: there's an earlier issue on this. It lacks the 8.0 parallel context, and the issue @shlomi-noach probably hit when he said it is slower, is possibly this one? In any case, I've verified I can bulk-parallel insert with improved performance.) |
Not started |
| Defer Binary Log Apply | Currently gh-ost prioritizes applying the binary log ahead of copying rows. I actually think it’s possible to track only the primary keys that were discovered in the binary log in memory + if the last modification was a delete or not (bool). If this is kept in a map, then it can be applied after the copy is done. The benefit of this change is most evident in workloads that tend to update the same rows. Edit: This optimization requires mem-comparable primary keys. So it won't work on varchar primary keys with collations. | Not started |
| Resume from failure | I know there is a stale PR for this. This doesn’t improve the performance, but it’s semi-related since some of our long running DDLs fail. We also like to use daily pod-cycling on our k8s clusters, so having 2 week long single processes complicates our infra. | See branch here. |
| Better ETA estimates | The current ETA estimator is based on estimatedTime - elapsedTime from the start of the copy. This skews poorly for larger tables, which become slower to insert into. As dynamic chunk size/throttling is introduced it also doesn't respond to changes well with a more accurate estimate. Ideally the estimate evaluates how many rows are left to copy and compares that to how many rows were copied in the last few minutes. |
See PR here |
That's the raw idea list - there is a good chance we will be able to provide patches for some of these too, but I wanted to check-in first so we can discuss. Maybe you have a few of your own ideas too? :-)
I have a proof of concept branch for dynamic chunk size @ https://github.com/github/gh-ost/compare/master...morgo:gh-ost:dynamic-chunk-size?expand=1
Here's what I observed with some sample tables:
| Test | Default Chunk Size (1000) | Dynamic Chunk Size (default 50ms target) |
|---|---|---|
| Typical Table (stock) | 1m10s(copy) | 58s(copy) |
| Skinny Table (skinnytable) | 38s(copy) | 30s(copy) |
This is the time for the copy-phase only. The test is on localhost too, so there might be marginal additional improvements on high latency networks.
The table definitions are as follows:
mysql [localhost:8031] {msandbox} (test) > show create table stock\G
*************************** 1. row ***************************
Table: stock
Create Table: CREATE TABLE `stock` (
`s_i_id` int NOT NULL,
`s_w_id` int NOT NULL,
`s_quantity` int DEFAULT NULL,
`s_dist_01` char(24) DEFAULT NULL,
`s_dist_02` char(24) DEFAULT NULL,
`s_dist_03` char(24) DEFAULT NULL,
`s_dist_04` char(24) DEFAULT NULL,
`s_dist_05` char(24) DEFAULT NULL,
`s_dist_06` char(24) DEFAULT NULL,
`s_dist_07` char(24) DEFAULT NULL,
`s_dist_08` char(24) DEFAULT NULL,
`s_dist_09` char(24) DEFAULT NULL,
`s_dist_10` char(24) DEFAULT NULL,
`s_ytd` int DEFAULT NULL,
`s_order_cnt` int DEFAULT NULL,
`s_remote_cnt` int DEFAULT NULL,
`s_data` varchar(50) DEFAULT NULL,
PRIMARY KEY (`s_w_id`,`s_i_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci
1 row in set (0.00 sec)
mysql [localhost:8031] {msandbox} (test) > show create table skinnytable\G
*************************** 1. row ***************************
Table: skinnytable
Create Table: CREATE TABLE `skinnytable` (
`a` int NOT NULL AUTO_INCREMENT,
`b` int NOT NULL,
PRIMARY KEY (`a`)
) ENGINE=InnoDB AUTO_INCREMENT=14091066 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci
1 row in set (0.00 sec)
From the debug output, the target batch size for the skinny table hovered around 22K. For stock it was about 8K.