burrito icon indicating copy to clipboard operation
burrito copied to clipboard

ZSTD instead of XZ?

Open M-Gonzalo opened this issue 1 year ago • 3 comments

Hi! This is a great, great project!

I noticed Burrito is using xz to compress its payload (lzma2 I'm guessing). It's certainly a good enough algorithm/format, with good compression and nice decompression speed, and it can be extracted basically anywhere.

There has been a shift though, in the last years, towards zstd for the same use cases one might otherwise use xz. There are many examples, one of them being the distribution format of Arch Linux packages, or squashfs images on live ISOs.

The reason/s is mainly that zstd allows for ~10x decompression speed while maintaining a competitive ratio. And, depending on the original uncompressed size, it can deliver a much better compression, using the --long option (built-in long-range deduplication), which allows it to "see" orders of magnitude more information and include it in its LZ dictionary.

The possible downside could be a slightly reduced presence of the decompressor program (especially in very old, outdated systems)

In summary, zstd will speed up the start time of a Burrito app, and probably reduce considerably its size, at the cost of a (possibly) reduced target count.

M-Gonzalo avatar Jan 27 '24 19:01 M-Gonzalo

Certainly sounds like it would make a good option, where you don't think distribution to older systems is an issue.

mmower avatar Jan 28 '24 11:01 mmower

It looks like zig HAS a zstd decompressor already built in! (https://github.com/ziglang/zig/pull/14394)

Seems like they don't have a compressor yet oddly enough. But this is something we could work with

doawoo avatar Feb 11 '24 20:02 doawoo

It looks like zig HAS a zstd decompressor already built in! (ziglang/zig#14394)

Seems like they don't have a compressor yet oddly enough. But this is something we could work with

The compression algorithm is pretty complex and the reference implementation has all sorts of clever optimizations so everyone will probably just use that instead of rewriting. This could be an opportunity to create an integration with Elixir though, IDK if via nifs, ports, or something else. I'm just starting with the language so it's a little over my head for now but I'll look into it anyways. Maybe is not that difficult to pull off.

M-Gonzalo avatar Feb 12 '24 14:02 M-Gonzalo

After doing some light testing I can see that zstd produces a much larger packed binary... even in the small cli_example app when using the very CPU heavy/aggressive compression level:

image

The unpack/first run times:

./example_cli_app_macos_m1  3.45s user 0.43s system 89% cpu 4.357 total
./example_cli_app_macos_m1_zstd  5.90s user 0.44s system 89% cpu 7.065 total

It even seems slower during unpack, but that might also be due to Zig's decompressor implementation.

Ultimately I tend prefer the binary be smaller, so for now, I think we'll be sticking with XZ.

doawoo avatar Jul 27 '24 00:07 doawoo