mixer-tools icon indicating copy to clipboard operation
mixer-tools copied to clipboard

Explore options on zstd for compression performance

Open gtkramer opened this issue 5 years ago • 15 comments

Either from a CPU or memory perspective, let's explore if using the options for zstd affects compression performance to see if it can get better than xz by a degree enough to switch to zstd by default.

gtkramer avatar Nov 08 '19 18:11 gtkramer

I explored various flag of zstd like fast, adapt, ultra and tried different levels. The size vs time trade off is not worth to switch zstd as default .

ashleshaAtrey avatar Nov 12 '19 20:11 ashleshaAtrey

@ashleshaAtrey I'm interested in if you still have those numbers around. It might be something that is worth exposing as a configuration option to the user. Some may find the trade-off to be worth it depending on the particular content that they are providing as eventually this would be used by 3rd-party bundles as well.

bryteise avatar Nov 12 '19 22:11 bryteise

@bryteise It is already exposed. You can use it by setting COMPRESSION = ["external-zstd"] in builder.conf

rchiossi avatar Nov 13 '19 00:11 rchiossi

Total size before compression 43184317440 Zstd Total size after compression 16528392879 CREATE FULLFILES 13m33.7s

zstd --fast=22 Total size after compression 25457700882 CREATE FULLFILES 13m19.447s

Zstd --adapt Total size after compression 16354637680 CREATE FULLFILES 13m25.032s

Zstd --fast Total size after compression 19045817625 CREATE FULLFILES 13m11.38s

Zstd --ultra Total size after compression 16380224911 CREATE FULLFILES 13m22.015s

ashleshaAtrey avatar Nov 13 '19 00:11 ashleshaAtrey

Do you have the numbers for the other compression methods as well (xz, bzip2, gzip)?

rchiossi avatar Nov 13 '19 00:11 rchiossi

Xz: Total size before compression 43183234560 Total size after compression 13877547116 CREATE FULLFILES 14m55.218s

bzip2 Total size before compression 43183935488 Total size after compression 16014229591 CREATE FULLFILES 12m18.674s

gzip Total size before compression 43182304256 Total size after compression 16910068787 CREATE FULLFILES 11m54.421s

ashleshaAtrey avatar Nov 13 '19 05:11 ashleshaAtrey

I think that decompression speed is also worth cross-comparing, because xz tends to be slower than zstd in this area, and swupd has to decompress many files in course of its operation.

phmccarty avatar Nov 13 '19 07:11 phmccarty

Also note Arjan's old blog post where he conducted a detailed cross-comparison for compression types. It would be awesome to produce a followup to that post with the latest findings.

phmccarty avatar Nov 13 '19 07:11 phmccarty

I ran decompression tests on 906752 fullfiles using the tar utility,

For XZ compression, time took to decompress: 11134 seconds For Zstd compression, time took to decompress: 19767 seconds To decompress optimal size( mix of XZ or Zstd files): 12991 seconds

Next, I will work on finding the stats for memory used while compressing and decompressing those files.

ashleshaAtrey avatar Mar 27 '20 18:03 ashleshaAtrey

Memory used while creating fullfiles: Alloc is bytes of allocated heap objects. TotalAlloc increases as heap objects are allocated, but unlike Alloc and HeapAlloc, it does not decrease when objects are freed Sys is the total bytes of memory obtained from the OS

Zstd compression Alloc = 6493 MiB TotalAlloc = 49925 MiB Sys = 9213 MiB
xz compression Alloc = 5850 MiB TotalAlloc = 49926 MiB Sys = 9344 MiB

ashleshaAtrey avatar Mar 27 '20 21:03 ashleshaAtrey

Interesting zstd is taking longer to decompress (and by a fairly significant amount too). That's really surprising.

bryteise avatar Mar 27 '20 23:03 bryteise

@ashleshaAtrey Can you review if the numbers and units look correct.

  compression time (minutes) decompression time (minutes) memory usage (MiB) compression size (Bytes)
xz 14.91666667 185.5666667 9344 13877547116
zstd 13.55 329.45 9213 16528392879

reaganlo avatar Mar 30 '20 20:03 reaganlo

I am very surprised to see zstd have such a large decompression time. In my experience, decompression time for zstd has been dramatically faster than xz...

phmccarty avatar Mar 30 '20 20:03 phmccarty

Thinking about this a little more.

I wonder if there is a set of files which take disproportionately longer to decompress (but there are few of them) or files that take slightly longer to decompress (but there are many of them)?

Figuring this out would be enlightening. Do you have file by file time differences?

bryteise avatar Mar 30 '20 21:03 bryteise

I will work on finding file by file time difference, dont have those stats handy.

ashleshaAtrey avatar Mar 30 '20 21:03 ashleshaAtrey