influx-cli icon indicating copy to clipboard operation
influx-cli copied to clipboard

Speed up backup process by downloading multiple Shard Groups in parallel

Open TwentyFiveSoftware opened this issue 3 years ago • 1 comments

Currently, the backup process downloads one shard at a time from the Influx API and stores it on the file system. This process tends to be very slow on larger databases, as it doesn't take advantage of large IO capacity which could speed up this process tremendously.

This PR introduces a pool of workers downloading a bunch of shards in parallel, split at the layer of shard groups, because a shard group only holds a single shard in the Influx OSS version, which obviously wouldn't make sense to parallelize.

My benchmarked speedup of the parallelization in a VM running on my machine with a limited IO capacity is already 2 to 3 times, but is probably even more on a beefier system.

TwentyFiveSoftware avatar Mar 08 '22 15:03 TwentyFiveSoftware

Closes #366

TwentyFiveSoftware avatar Mar 08 '22 16:03 TwentyFiveSoftware