Performance Improvements (Raw Ideas)

Open morgo opened this issue 3 years ago • 1 comments

We are looking to improve the performance of gh-ost so that we can safely operate with larger database sizes. Our use case is a bit different to GitHub’s in that we do not use read-replicas. Some of the DBs have very cyclic usage as well (i.e. only busy 9-5 M-F), and may have windows of free capacity.

I have a few ideas I wanted to run by you since I’m sure some have come up prior:

Feature	Context	Status
Use 8.0 ALGORITHM=INSTANT when applicable	MySQL 8.0 does not require gh-ost in some cases.	Merged, thank you kindly for the review!
Dynamic Chunk Size	Gh-ost can observe the exec time of processing chunks, and dynamically increase the size if it fits below a threshold. For our environment (because we have a lot of replica tolerance) we typically run larger batch sizes, but have varying DB instance sizes. Being able to have this auto-tune is a win for us.	See PR here (and issue comment)
Multi-threaded applier	Parallel replication apply is much better in MySQL 8.0 – so combined with that we don’t use read-replicas, we can probably push more changes through the binlog than `gh-ost` currently does. We think we can tolerate a few minutes of replica lag. Our limit is Aurora restricts the relay log to ~1000M, if we exceed that.. we reduce our DR capabilities. (Note: there's an earlier issue on this. It lacks the 8.0 parallel context, and the issue @shlomi-noach probably hit when he said it is slower, is possibly this one? In any case, I've verified I can bulk-parallel insert with improved performance.)	Not started
Defer Binary Log Apply	Currently gh-ost prioritizes applying the binary log ahead of copying rows. I actually think it’s possible to track only the primary keys that were discovered in the binary log in memory + if the last modification was a delete or not (bool). If this is kept in a map, then it can be applied after the copy is done. The benefit of this change is most evident in workloads that tend to update the same rows. Edit: This optimization requires mem-comparable primary keys. So it won't work on varchar primary keys with collations.	Not started
Resume from failure	I know there is a stale PR for this. This doesn’t improve the performance, but it’s semi-related since some of our long running DDLs fail. We also like to use daily pod-cycling on our k8s clusters, so having 2 week long single processes complicates our infra.	See branch here.
Better ETA estimates	The current ETA estimator is based on `estimatedTime - elapsedTime` from the start of the copy. This skews poorly for larger tables, which become slower to insert into. As dynamic chunk size/throttling is introduced it also doesn't respond to changes well with a more accurate estimate. Ideally the estimate evaluates how many rows are left to copy and compares that to how many rows were copied in the last few minutes.	See PR here

That's the raw idea list - there is a good chance we will be able to provide patches for some of these too, but I wanted to check-in first so we can discuss. Maybe you have a few of your own ideas too? :-)

Nov 17 '22 17:11 morgo

I have a proof of concept branch for dynamic chunk size @ https://github.com/github/gh-ost/compare/master...morgo:gh-ost:dynamic-chunk-size?expand=1

Here's what I observed with some sample tables:

Test	Default Chunk Size (1000)	Dynamic Chunk Size (default 50ms target)
Typical Table (stock)	1m10s(copy)	58s(copy)
Skinny Table (skinnytable)	38s(copy)	30s(copy)

This is the time for the copy-phase only. The test is on localhost too, so there might be marginal additional improvements on high latency networks.

The table definitions are as follows:

mysql [localhost:8031] {msandbox} (test) > show create table stock\G
*************************** 1. row ***************************
       Table: stock
Create Table: CREATE TABLE `stock` (
  `s_i_id` int NOT NULL,
  `s_w_id` int NOT NULL,
  `s_quantity` int DEFAULT NULL,
  `s_dist_01` char(24) DEFAULT NULL,
  `s_dist_02` char(24) DEFAULT NULL,
  `s_dist_03` char(24) DEFAULT NULL,
  `s_dist_04` char(24) DEFAULT NULL,
  `s_dist_05` char(24) DEFAULT NULL,
  `s_dist_06` char(24) DEFAULT NULL,
  `s_dist_07` char(24) DEFAULT NULL,
  `s_dist_08` char(24) DEFAULT NULL,
  `s_dist_09` char(24) DEFAULT NULL,
  `s_dist_10` char(24) DEFAULT NULL,
  `s_ytd` int DEFAULT NULL,
  `s_order_cnt` int DEFAULT NULL,
  `s_remote_cnt` int DEFAULT NULL,
  `s_data` varchar(50) DEFAULT NULL,
  PRIMARY KEY (`s_w_id`,`s_i_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci
1 row in set (0.00 sec)

mysql [localhost:8031] {msandbox} (test) > show create table skinnytable\G
*************************** 1. row ***************************
       Table: skinnytable
Create Table: CREATE TABLE `skinnytable` (
  `a` int NOT NULL AUTO_INCREMENT,
  `b` int NOT NULL,
  PRIMARY KEY (`a`)
) ENGINE=InnoDB AUTO_INCREMENT=14091066 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci
1 row in set (0.00 sec)

From the debug output, the target batch size for the skinny table hovered around 22K. For stock it was about 8K.

Dec 02 '22 18:12 morgo