backup-utils
backup-utils copied to clipboard
Operations CPU constrained on gzip
When looking at why our GHE backups were taking so long, I noticed that various operations are CPU constrained on gzip compression/decompression. For example, when dumping the MySQL database, I witnessed the MySQL process consume ~14 minutes of CPU time total and the gzip
process it was piping to consuming ~45 minutes of CPU time! Put another way, if the compressor could ingest the line speed that MySQL dumping is capable of emitting, GHE backups would complete ~30 minutes faster on this instance on just the MySQL data bits alone. On the decompression side, the MySQL archive consumed ~39 minutes of CPU time with gzip.
Large backup operations could be substantially faster if a modern, faster compression library were used. I personally recommend zstd, which yields better and faster compression than zlib/gzip at default/normal compression levels. I zstd compressed the MySQL dump of our GHE instance (using level 3 - the default) and the resulting archive was ~75% the size of the gzip version and took far less CPU to compress. On the decompression side, it required ~130s of CPU versus ~2,400s.
In summary, various GHE backup operations are CPU constrained by gzip compression. Replacing gzip with something more modern like zstd will make these operations faster, substantially so on larger GHE instances.