dwarfs
dwarfs copied to clipboard
Implement precomp or write own, for further compression capabilities.
Not sure if you know of it but https://github.com/schnaader/precomp-cpp
This has stalled development 2 years ago, seems like mp3 compression has critical bug left unfixed. Besides that nothing major from what I saw.
I have not tested it's efficiency yet due to no directory support, but of course dwarfs could take care of that since it compresses files one by one anyway.
I will test if zstd benefits from this a bit later and send my results. Just wanted to share the idea first, I might not be aware of all the challenges in implementing it.
This is something I would think would be super neat too, but it's unclear what the performance impact might be.
This is something I would think would be super neat too, but it's unclear what the performance impact might be.
Yeah it can potentially be a notable issue.
I will note that there are ways to decompress files that use deflate, assuming whatever you're using can still work with a decompressed stream. AdvanceCOMP and can handle gz, zip, and png files, for example. qpdf can be used to decompress pdfs.
I have not tested it's efficiency yet
I have now, and re-compressing zip is pretty efficient.
Another alternative to precomp or advancecomp is xtool https://github.com/Razor12911/xtool
I've been following along, but I'm unsure about what the actual feature request is.
I guess from what I've read about the precomp tool that the most straightforward way to use it in the context of DwarFS would be to run files that look like archives through precomp without compression and then letting DwarFS take care of deduplication/compression.
Upon accessing the archive in the mounted filesystem (or upon extraction), the data would be run through the equivalent of precomp -r.
I think this is an interesting idea, but there are a few issues I anticipate:
- It's likely going to significantly slow down compression
- It would require some changes to the filesystem format
- Access to archives in a mounted filesystem would be painfully slow
- Random access (although probably not very useful for archives) even more so
- It might require a separate caching layer
- I have no idea how to best deal with huge precomp'd archives (e.g. too big to fit in the cache)
Maybe the solution for the random access problem would be simply to not try and optimize for it. Rather, assume that archives will mostly be read sequentially, and if someone does a random access, it's just going to be painfully slow.
I do wonder how well maintained precomp and its potential alternatives are given that the latest release of precomp is from early 2019.
My own viewpoint is that this would be a useful thing but not necessarily something that should be within dwarfs. I've looked at precomp's code and it doesn't seem to offer any form of clear API so it would likely be a notable effort to make a filesystem that recompresses precomp'd files.
I will say that this can be useful for, for example, folders with large amounts of deflated files either in nonstandard containers or requiring to be kept in their original compressed state (wherein AdvanceCOMP or qpdf can't just be used to decompress them).
Okay, this is likely a can of worms and certainly something that requires more thought to be put into. I'll move it to discussions for now.