nix-index
nix-index copied to clipboard
Consider a different compression level for Zstandard
The default is currently 22, but it looks like it recent versions it's been reduced to 19. Also, level 22 doesn't yield a compression ratio much better than the default (-3), but takes substantially longer. Here's the results for compressing an uncompressed tarball of nixpkgs:
nixpkgs.tar : 19.68% (111411200 => 21922085 bytes, 3.zst)
nix run nixpkgs#zstd -- nixpkgs.tar -${i} -o ${i}.zst 0.45s user 0.05s system 109% cpu 0.458 total
nixpkgs.tar : 19.62% (111411200 => 21862082 bytes, 4.zst)
nix run nixpkgs#zstd -- nixpkgs.tar -${i} -o ${i}.zst 0.51s user 0.05s system 108% cpu 0.514 total
nixpkgs.tar : 19.01% (111411200 => 21174846 bytes, 5.zst)
nix run nixpkgs#zstd -- nixpkgs.tar -${i} -o ${i}.zst 0.81s user 0.04s system 104% cpu 0.817 total
nixpkgs.tar : 18.53% (111411200 => 20642174 bytes, 6.zst)
nix run nixpkgs#zstd -- nixpkgs.tar -${i} -o ${i}.zst 1.00s user 0.04s system 103% cpu 0.999 total
nixpkgs.tar : 17.56% (111411200 => 19560791 bytes, 9.zst)
nix run nixpkgs#zstd -- nixpkgs.tar -${i} -o ${i}.zst 1.92s user 0.04s system 102% cpu 1.916 total
nixpkgs.tar : 16.99% (111411200 => 18927901 bytes, 12.zst)
nix run nixpkgs#zstd -- nixpkgs.tar -${i} -o ${i}.zst 3.77s user 0.06s system 101% cpu 3.780 total
nixpkgs.tar : 16.37% (111411200 => 18241631 bytes, 15.zst)
nix run nixpkgs#zstd -- nixpkgs.tar -${i} -o ${i}.zst 10.58s user 0.07s system 100% cpu 10.608 total
nixpkgs.tar : 15.55% (111411200 => 17325552 bytes, 18.zst)
nix run nixpkgs#zstd -- nixpkgs.tar -${i} -o ${i}.zst 24.79s user 0.10s system 100% cpu 24.834 total
Warning : compression level higher than max, reduced to 19
nixpkgs.tar : 15.27% (111411200 => 17012327 bytes, 22.zst)
nix run nixpkgs#zstd -- nixpkgs.tar -${i} -o ${i}.zst 35.15s user 0.10s system 100% cpu 35.193 total
where the output file denotes the compression level (i.e. 6.zst was compressed with level 6). Looking at the data, it seems like the default of 3 is probably the best? Or at least anything less than 12?
The reason I initially set tit to the highest level is that I expect the index to be created only once, but used many times. Therefore, I assumed the indexing time doesn't really matter that much, whereas disk space is a cost you pay forever. Do you have a use case where a few minutes of extra indexing time has a big impact? Note also that compressing is done at the same time as fetching, and I believe indexing is mostly network-IO bound unless you have a very fast network.
Note that we don't compress the tarball of nixpkgs, but a custom data structure. Using the actual nix-index database, we can run the experiment like this:
$ cd ~/.cache/nix-index
$ tail -c+13 files | zstdcat > files.raw
$ for i in (seq 19); time zstd -$i -o $i.zst files.raw; end
Which gives the following data:
files.raw : 28.82% (88865494 => 25613130 bytes, 1.zst)
________________________________________________________
Executed in 403.78 millis fish external
usr time 414.35 millis 893.00 micros 413.46 millis
sys time 36.67 millis 61.00 micros 36.61 millis
files.raw : 27.08% (88865494 => 24063919 bytes, 2.zst)
________________________________________________________
Executed in 526.18 millis fish external
usr time 546.19 millis 0.00 millis 546.19 millis
sys time 34.99 millis 1.06 millis 33.93 millis
files.raw : 26.07% (88865494 => 23167408 bytes, 3.zst)
________________________________________________________
Executed in 710.79 millis fish external
usr time 707.52 millis 0.00 millis 707.52 millis
sys time 65.11 millis 1.18 millis 63.93 millis
files.raw : 25.80% (88865494 => 22924652 bytes, 4.zst)
________________________________________________________
Executed in 1.01 secs fish external
usr time 1.01 secs 0.00 micros 1.01 secs
sys time 0.06 secs 853.00 micros 0.06 secs
files.raw : 24.96% (88865494 => 22183517 bytes, 5.zst)
________________________________________________________
Executed in 1.73 secs fish external
usr time 1.73 secs 0.00 micros 1.73 secs
sys time 0.05 secs 865.00 micros 0.05 secs
files.raw : 24.66% (88865494 => 21916181 bytes, 6.zst)
________________________________________________________
Executed in 2.81 secs fish external
usr time 2.78 secs 0.00 micros 2.78 secs
sys time 0.07 secs 862.00 micros 0.07 secs
files.raw : 23.99% (88865494 => 21323139 bytes, 7.zst)
________________________________________________________
Executed in 3.74 secs fish external
usr time 3.75 secs 0.00 millis 3.75 secs
sys time 0.05 secs 1.03 millis 0.05 secs
files.raw : 23.64% (88865494 => 21011208 bytes, 8.zst)
________________________________________________________
Executed in 4.74 secs fish external
usr time 4.73 secs 0.00 millis 4.73 secs
sys time 0.06 secs 1.06 millis 0.06 secs
files.raw : 23.49% (88865494 => 20875617 bytes, 9.zst)
________________________________________________________
Executed in 6.70 secs fish external
usr time 6.68 secs 0.00 millis 6.68 secs
sys time 0.07 secs 1.04 millis 0.07 secs
files.raw : 23.14% (88865494 => 20563492 bytes, 10.zst)
________________________________________________________
Executed in 9.68 secs fish external
usr time 9.61 secs 0.00 millis 9.61 secs
sys time 0.12 secs 1.08 millis 0.12 secs
files.raw : 23.05% (88865494 => 20482471 bytes, 11.zst)
________________________________________________________
Executed in 11.77 secs fish external
usr time 11.71 secs 0.00 millis 11.71 secs
sys time 0.11 secs 1.15 millis 0.11 secs
files.raw : 22.93% (88865494 => 20375330 bytes, 12.zst)
________________________________________________________
Executed in 18.25 secs fish external
usr time 18.17 secs 0.00 millis 18.17 secs
sys time 0.12 secs 1.62 millis 0.12 secs
files.raw : 22.79% (88865494 => 20251690 bytes, 13.zst)
________________________________________________________
Executed in 14.76 secs fish external
usr time 14.71 secs 0.00 millis 14.71 secs
sys time 0.09 secs 1.15 millis 0.09 secs
files.raw : 22.65% (88865494 => 20125406 bytes, 14.zst)
________________________________________________________
Executed in 18.62 secs fish external
usr time 18.56 secs 0.00 micros 18.56 secs
sys time 0.09 secs 852.00 micros 0.09 secs
files.raw : 22.54% (88865494 => 20030096 bytes, 15.zst)
________________________________________________________
Executed in 24.12 secs fish external
usr time 24.04 secs 0.00 micros 24.04 secs
sys time 0.09 secs 952.00 micros 0.09 secs
files.raw : 21.36% (88865494 => 18982930 bytes, 16.zst)
________________________________________________________
Executed in 28.88 secs fish external
usr time 28.81 secs 0.00 micros 28.81 secs
sys time 0.08 secs 838.00 micros 0.08 secs
files.raw : 20.67% (88865494 => 18371249 bytes, 17.zst)
________________________________________________________
Executed in 38.83 secs fish external
usr time 38.71 secs 0.00 millis 38.71 secs
sys time 0.10 secs 1.24 millis 0.10 secs
files.raw : 20.51% (88865494 => 18230390 bytes, 18.zst)
________________________________________________________
Executed in 46.56 secs fish external
usr time 46.46 secs 0.00 micros 46.46 secs
sys time 0.07 secs 894.00 micros 0.07 secs
files.raw : 20.30% (88865494 => 18035524 bytes, 19.zst)
________________________________________________________
Executed in 60.52 secs fish external
usr time 60.38 secs 0.00 micros 60.38 secs
sys time 0.10 secs 856.00 micros 0.10 secs
I see. In that case, does decompression speed matter? I'd imagine the index would have to be decompressed on every call to command-not-found, but even with -22 it seems fast enough so I'm guessing it doesn't matter much.
I think the compression speed starts to matter more for someone like me with local packages provided via overlay, in which case I may update the index often, but I haven't gotten overlays to work with nix-index so it's a moot point. As a side question, is it possible for overlays to work with nix-index?
In any case it seems like -22 is fine, although something like -16 seems like a decent tradeoff (2x the speed for 1% lower compression ratio), but the call is yours!
Thanks for the quick response.
Nix-index depends on the file listings provided by hydra. So if you have custom overlays, there's no way for nix-index to know which files would be in the output of a derivation if it wasn't built by hydra. Perhaps we could add packages that are present in the local nix store to the index.