qubes-issues icon indicating copy to clipboard operation
qubes-issues copied to clipboard

Switch from zlib to zstd for backup compression

Open DemiMarie opened this issue 2 years ago • 11 comments

How to file a helpful issue

The problem you're addressing (if any)

zlib compression is slow and is often the bottleneck during backup generation (as per top(1)).

The solution you'd like

Use zstd compression instead, which is significantly faster and can natively use multiple CPU cores.

The value to a user, and who that user might be

All users will benefit from faster backups.

DemiMarie avatar May 13 '23 15:05 DemiMarie

I don't know much about compression, but the Wikipedia articles for zlib and zstd seem to indicate that the former refers to both a library and an algorithm, whereas the latter refers to just an algorithm (the compressed files of which typically use the .zst extension). Is this accurate? Are you proposing changing the default compression algorithm used by qvm-backup?

andrewdavidwong avatar May 13 '23 17:05 andrewdavidwong

Are you proposing changing the default compression algorithm used by qvm-backup?

Yes, for performance reasons. Zstd has better compression and compresses faster.

DemiMarie avatar May 14 '23 00:05 DemiMarie

Are you proposing changing the default compression algorithm used by qvm-backup?

Yes, for performance reasons. Zstd has better compression and compresses faster.

Performance is not the only consideration. Such a change would have significant implications for the ability to recover data from Qubes backups in emergency scenarios. Since gzip is ubiquitous, while zstd is comparatively new, users will have to store some kind of zstd binary with their backups or risk their data being unrecoverable in such scenarios.

andrewdavidwong avatar May 14 '23 20:05 andrewdavidwong

Are you proposing changing the default compression algorithm used by qvm-backup?

Yes, for performance reasons. Zstd has better compression and compresses faster.

Performance is not the only consideration. Such a change would have significant implications for the ability to recover data from Qubes backups in emergency scenarios. Since gzip is ubiquitous, while zstd is comparatively new, users will have to store some kind of zstd binary with their backups or risk their data being unrecoverable in such scenarios.

zstd is available on pretty much every Linux distribution IIUC.

DemiMarie avatar May 15 '23 23:05 DemiMarie

zstd is available on pretty much every Linux distribution IIUC.

Only in recent years, according to this: https://en.wikipedia.org/wiki/Zstd#Usage

It seems like it's still somewhat experimental and in the process of being rolled out.

Also, "available on" does not necessarily mean "preinstalled by default," which is a safe assumption for gzip.

In many emergency scenarios, the user may only have access to an older computer or an older installation medium (e.g., a Linux ISO on a USB drive or disc that's a few years old).

andrewdavidwong avatar May 16 '23 08:05 andrewdavidwong

I understand the concerns about emergency recovery with zstd but on the other hand the performance benefits (both in compression speed and ratio) of using zstd compared to gzip are pretty impressive and many users would probably like to benefit from this. Would it be possible to just give users the choice between gzip and zstd? That way users concerned about emergency recovery with an old Linux ISO can still use gzip while other users more concerned about performance can switch to zstd.

Also, for zstd there should probably also be an option to change the compression level, based on the benchmarks (and the "Compression Speed vs Ratio" diagram) from http://facebook.github.io/zstd/ different users may want to use different tradeoffs between speed and compression ratio.

And one more implementation note: zstd readily supports multi-threaded compression, probably a good idea to enable this (e.g. by passing the -T0 parameter) when adding zstd support.

jakoblell avatar May 18 '23 07:05 jakoblell

Would it be possible to just give users the choice between gzip and zstd?

Isn't the option already available? For example, you can already do qvm-backup --compress-filter bzip2 to use bzip2 instead of gzip. I've been using this for years, and it works great. I haven't tried zstd, but I was under the impression that this could be used for any compression filter available in dom0.

Also, for zstd there should probably also be an option to change the compression level [...]

And one more implementation note: zstd readily supports multi-threaded compression, probably a good idea to enable this (e.g. by passing the -T0 parameter) when adding zstd support.

It might already be possible to pass sub-arguments when using the --compress-filter option, but I'm not certain. I vaguely recall experimenting with this many years ago and being able to do it.

andrewdavidwong avatar May 19 '23 00:05 andrewdavidwong

Isn't the option already available? For example, you can already do qvm-backup --compress-filter bzip2 to use bzip2 instead of gzip. I've been using this for years, and it works great.

Many users are using the GUI for doing backups and there is no choice at all for the compression algorithm there, you can only enable/disable gzip compression in the GUI. Would be great to have a choice there to use zstd with a configurable compression level.

I haven't tried zstd, but I was under the impression that this could be used for any compression filter available in dom0.

Haven't tried it as well but in any case the restoring operation currently doesn't support zstd (even if the header indicates zstd compression) since it is not listed in KNOWN_COMPRESSION_FILTERS here: https://github.com/QubesOS/qubes-core-admin-client/blob/ba9b24db90c1b09826b6fcff61f98941565a2824/qubesadmin/backup/restore.py#L68

jakoblell avatar May 19 '23 08:05 jakoblell

Many users are using the GUI for doing backups and there is no choice at all for the compression algorithm there, you can only enable/disable gzip compression in the GUI. Would be great to have a choice there to use zstd with a configurable compression level.

That should be a separate feature request, since it would presumably allow for specifying any supported compression filter (and perhaps a compression level for that compression filter, if applicable), not just zstd. I thought we already had a separate issue for this, but I wasn't able to find one just now. Please feel free to open one, if you still wish to.

(Found a somewhat-related issue while searching: https://github.com/QubesOS/qubes-issues/issues/3865)

Haven't tried it as well but in any case the restoring operation currently doesn't support zstd (even if the header indicates zstd compression) since it is not listed in KNOWN_COMPRESSION_FILTERS here: https://github.com/QubesOS/qubes-core-admin-client/blob/ba9b24db90c1b09826b6fcff61f98941565a2824/qubesadmin/backup/restore.py#L68

Ah, I see. Thank you for pointing that out.

andrewdavidwong avatar May 20 '23 03:05 andrewdavidwong

Haven't tried it as well but in any case the restoring operation currently doesn't support zstd (even if the header indicates zstd compression) since it is not listed in KNOWN_COMPRESSION_FILTERS here: https://github.com/QubesOS/qubes-core-admin-client/blob/ba9b24db90c1b09826b6fcff61f98941565a2824/qubesadmin/backup/restore.py#L68

That's only partially true. zstd will not be automatically accepted, but it will work if you use --compress-filter zstd during restore too.

marmarek avatar May 20 '23 10:05 marmarek

That's only partially true. zstd will not be automatically accepted, but it will work if you use --compress-filter zstd during restore too.

OK. With the patch to core-admin-client, the above is not necessary anymore as qvm-backup-restore will automatically detect/select zstd during restore if the backup compression is zstd. And it will ask user to install it if it is missing.

With the other patch to Qube Manager (backup) which is merged but not released yet, users will have a choice to select zstd or zstdmt (multi-thread) to backup if it is installed. And it will remember their selection.

On the original issue, performance for zstdmt is not comparable with zlib. Not even close. But pushing/forcing it on users is counter-intuitive. I believe it would be better to include zstd package for the default installation. Let users experiment for themselves. Then they will automatically come back and ask for it to be the default.

alimirjamali avatar Jun 02 '25 14:06 alimirjamali