sqld icon indicating copy to clipboard operation
sqld copied to clipboard

bottomless: add xz compression option

Open psarna opened this issue 2 years ago • 7 comments

Empirical testing shows, that gzip achieves mere x2 compression ratio even with very simple and repeatable data patterns. Since compression is very important for optimizing our egress traffic and throughput in general, .xz algorithm is hereby implemented as well. Ran with the same data set, it achieved ~x50 compression ratio, which is orders of magnitude better than gzip, at the cost of elevated CPU usage.

Note: with more algos implemented, we should also consider adding code that detects which compression methods was used when restoring a snapshot, to allow restoring from a gzip file, but continue new snapshots with xz. Currently, setting the compression methods via the env var assumes that both restore and backup use the same algorithm.

psarna avatar Oct 16 '23 10:10 psarna

TODO: I still need to go over the code and check if there are no more hardcoded assumptions about using gzip for backups.

psarna avatar Oct 16 '23 10:10 psarna

env var, LIBSQL_BOTTOMLESS_COMPRESSION=xz. But before we go ahead with this, I think I need to add code that detects the previous compression scheme on restore. Without that, it will be impossible to restore from a gz, but use xz for all new backups.

psarna avatar Oct 17 '23 08:10 psarna

I'm getting corrupted .xz files produced with this crate in "Best" compression level. Let me try the default one, but that's off. The file compressed with the crate didn't properly unpack with xz -d shell command, which is suspicious.

psarna avatar Oct 17 '23 11:10 psarna

(yep, regular compression level works, and looks only ~10% worse than Best)

psarna avatar Oct 17 '23 11:10 psarna

There's one more place where compression isn't correctly autodetected - in loading main db snapshots. I'l add the code

psarna avatar Oct 17 '23 11:10 psarna

k, done

psarna avatar Oct 17 '23 14:10 psarna

Transplanted to the new repo: https://github.com/tursodatabase/libsql/pull/468

psarna avatar Oct 17 '23 14:10 psarna