dwarfs icon indicating copy to clipboard operation
dwarfs copied to clipboard

Implement precomp or write own, for further compression capabilities.

Open adminx01 opened this issue 3 years ago • 4 comments

Not sure if you know of it but https://github.com/schnaader/precomp-cpp

This has stalled development 2 years ago, seems like mp3 compression has critical bug left unfixed. Besides that nothing major from what I saw.

I have not tested it's efficiency yet due to no directory support, but of course dwarfs could take care of that since it compresses files one by one anyway.

I will test if zstd benefits from this a bit later and send my results. Just wanted to share the idea first, I might not be aware of all the challenges in implementing it.

adminx01 avatar Jul 22 '22 20:07 adminx01

This is something I would think would be super neat too, but it's unclear what the performance impact might be.

tpwrules avatar Jul 23 '22 01:07 tpwrules

This is something I would think would be super neat too, but it's unclear what the performance impact might be.

Yeah it can potentially be a notable issue.

I will note that there are ways to decompress files that use deflate, assuming whatever you're using can still work with a decompressed stream. AdvanceCOMP and can handle gz, zip, and png files, for example. qpdf can be used to decompress pdfs.

Phantop avatar Jul 23 '22 01:07 Phantop

I have not tested it's efficiency yet

I have now, and re-compressing zip is pretty efficient.

adminx01 avatar Aug 04 '22 13:08 adminx01

Another alternative to precomp or advancecomp is xtool https://github.com/Razor12911/xtool

adminx01 avatar Aug 19 '22 17:08 adminx01

I've been following along, but I'm unsure about what the actual feature request is.

I guess from what I've read about the precomp tool that the most straightforward way to use it in the context of DwarFS would be to run files that look like archives through precomp without compression and then letting DwarFS take care of deduplication/compression.

Upon accessing the archive in the mounted filesystem (or upon extraction), the data would be run through the equivalent of precomp -r.

I think this is an interesting idea, but there are a few issues I anticipate:

  • It's likely going to significantly slow down compression
  • It would require some changes to the filesystem format
  • Access to archives in a mounted filesystem would be painfully slow
  • Random access (although probably not very useful for archives) even more so
  • It might require a separate caching layer
  • I have no idea how to best deal with huge precomp'd archives (e.g. too big to fit in the cache)

Maybe the solution for the random access problem would be simply to not try and optimize for it. Rather, assume that archives will mostly be read sequentially, and if someone does a random access, it's just going to be painfully slow.

I do wonder how well maintained precomp and its potential alternatives are given that the latest release of precomp is from early 2019.

mhx avatar Oct 23 '22 17:10 mhx

My own viewpoint is that this would be a useful thing but not necessarily something that should be within dwarfs. I've looked at precomp's code and it doesn't seem to offer any form of clear API so it would likely be a notable effort to make a filesystem that recompresses precomp'd files.

I will say that this can be useful for, for example, folders with large amounts of deflated files either in nonstandard containers or requiring to be kept in their original compressed state (wherein AdvanceCOMP or qpdf can't just be used to decompress them).

Phantop avatar Oct 23 '22 17:10 Phantop

Okay, this is likely a can of worms and certainly something that requires more thought to be put into. I'll move it to discussions for now.

mhx avatar Oct 24 '22 06:10 mhx