backup-utils icon indicating copy to clipboard operation
backup-utils copied to clipboard

Operations CPU constrained on gzip

Open indygreg opened this issue 5 years ago • 0 comments

When looking at why our GHE backups were taking so long, I noticed that various operations are CPU constrained on gzip compression/decompression. For example, when dumping the MySQL database, I witnessed the MySQL process consume ~14 minutes of CPU time total and the gzip process it was piping to consuming ~45 minutes of CPU time! Put another way, if the compressor could ingest the line speed that MySQL dumping is capable of emitting, GHE backups would complete ~30 minutes faster on this instance on just the MySQL data bits alone. On the decompression side, the MySQL archive consumed ~39 minutes of CPU time with gzip.

Large backup operations could be substantially faster if a modern, faster compression library were used. I personally recommend zstd, which yields better and faster compression than zlib/gzip at default/normal compression levels. I zstd compressed the MySQL dump of our GHE instance (using level 3 - the default) and the resulting archive was ~75% the size of the gzip version and took far less CPU to compress. On the decompression side, it required ~130s of CPU versus ~2,400s.

In summary, various GHE backup operations are CPU constrained by gzip compression. Replacing gzip with something more modern like zstd will make these operations faster, substantially so on larger GHE instances.

indygreg avatar May 22 '19 19:05 indygreg