Add support for optional Zstd seekable format
Seekable format allows the generated compressed files to be random-access accesible.
https://github.com/facebook/zstd/tree/dev/contrib/seekable_format
I belive Zstd supports this format in the standard, but its up to the compressor to use it.
I dont know how extensive the changes needs to be to support it, i know it requires to use independient blocks and create the jump table.
I belive Zstd supports this format in the standard, but its up to the compressor to use it.
Well, not really... Seekable format is something like PoC, that was never merged into zstd mainline... However it is simply a "normal" framed compression with a frames table (jump table) in the stream, pointing to every frame and this table would be ignored by zstd decoders (if compliant).
I dont know how extensive the changes needs to be to support it
It depends on what is expected exactly:
- to enhance the zstd archive type (single file compression as .zstd file) could be relative simple, and the pros - the enhancement like
-eoffs=$offs:$len(7a790bece202b2cef47c9ff8ec11a6ffcd8f95f8) would benefit from it (it'd be faster by partial decompression if jump table can be detected); - whereas to enhance the 7z archive type with zstd method and solid blocks (multiple files as .7z file) it'd be more complex and would additionally need:
- integration with 7z container file- and block-table;
- dynamic frame sizes (e. g. to corelate the files size in solid block);
- and/or analyze of better frame size (especially if it'd remain fixed) depending on files to archive;
- recompression support (by update/delete);
- etc
Although for case 1 it'd be interesting, I don't think seekable format in the form how it is implemented right now (with fixed frame sizes in a jump table, without the reference to files etc) is good for case 2 - I guess 7z container can handle it better, but the fundamental principle is not bad: at the moment to get a single file from large solid block it'd "unpack" almost the whole block till it'd find the file content. Here is the short example illustrating the current performance issue of solid 7z containers:
- % exec 7z l -slt solid-archive.7z | grep -A 5 "Type = 7z"
+ % exec 7z l -slt nonsolid-archive.7z | grep -A 5 "Type = 7z"
Type = 7z
- Physical Size = 7731705
+ Physical Size = 13158069
- Headers Size = 29860
+ Headers Size = 33408
Method = ZSTD
- Solid = +
+ Solid = -
- Blocks = 1
+ Blocks = 2038
# decompress whole archive:
- % timerate {7z-srv e -so -- solid-archive.7z > NUL:}
- 94150.9 µs/# 11 # 10.621 #/sec 1035.660 net-ms
+ % timerate {7z-srv e -so -- nonsolid-archive.7z > NUL:}
+ 133007.3 µs/# 8 # 7.518 #/sec 1064.058 net-ms
# decompress single file:
- % timerate {7z-srv e -so -- solid-archive.7z file/in/the/midle > NUL:}
- 81742.2 µs/# 13 # 12.234 #/sec 1062.648 net-ms
+ % timerate {7z-srv e -so -- nonsolid-archive.7z file/in/the/midle > NUL:}
+ 8823.96 µs/# 114 # 113.33 #/sec 1005.931 net-ms
So solid block has obviously better compression ratio (here for 45MB - 16% - solid vs. 27% - non-solid archive), but at the same time it is definitely slow by partial decompression (single file or small piece of archive) - (here for 5KB file - 86.8% of whole decompression speed by solid archive vs. 6.6% of whole decompression speed by non-solid archive).
However that all is only interesting if someone needs a partial decompression in whatever form.
Yeah, i know decompressing partial files may sound useless for most cases.
The full story is that some years ago, I implemented a variation of LZ4 with random access in a community maintained open source game, so the game assets could be stored in LZ4 compressed files, but still enabling to read parts of the uncompressed data file and enabling seeking, without having to read the entire file or store the file in ram.
My implementation works but the result is something that is completely incompatible with the LZ4 standard, needing my own compressor and decompressor logic. But this works.
Now im looking to replace it for a more "compilant" solution, something that has better compression ratio, closer to LZMA2 and if possible something that do not need my own programs to create the files and extract them (if needed). What would be great for the users.
As i read that Zstd support the optional jump table in its specs i belive this is the ideal format to go to, but to my suprise no compressor supports creating files in this way (neither single or in a container). But i did not know that was something that was never merged into Zstd mainline. I hoped that would be easier to implement here tbh, as i belived to be part of the official format.
As for containers, im not sure what makes the diference (i dont understand enoght of the format yet), it is a problem to have for both .zstd single files and containers for like .7z and .zip?
it is a problem to have for both .zstd single files and containers for like .7z and .zip?
It is not a problem for .7z too (no idea about .zip right now)... But if you read attentively what I wrote above about, you'd surely consider that it is obviously more complex for the .7z container (to do it properly), because would expect more work and special integration of jump table with .7z container and its solid block, because otherwise it would be almost useless. With other words - what use would you have from the jump table, if it'd not get the reference to particular file or its offset in solid block of 7z container? Moreover I guess the format of 7z-container could be even more suitable than a naive jump table of seekable implementation (which was primary created for single file/stream).
Why not use https://peazip.github.io/index.html ?
Why not use https://peazip.github.io/index.html ?
peazip does not support making zstd files with the seekeable frame format. No compressor that i know of support it. The only way to generate zstd file with the seekeable frame format is to use their examples here https://github.com/facebook/zstd/tree/dev/contrib/seekable_format I did try them and they do work. But as expected the resulted compression ratio suffers when using small block sizes and thats the whole point of this. So i can understand why no one bothers to support it as an option.
What peazip do support is creating zips with XZ compression, since zstd was a no-go tried with XZ that was built in random access support, and it is working well, just slow decompression speeds. Unfortunally peazip does not allow selecting xz block size in zips and thats a no-go too :/ it is always something.