yacreader icon indicating copy to clipboard operation
yacreader copied to clipboard

Add libarchive decompression backend

Open bsdf opened this issue 3 years ago • 19 comments

this PR adds an additional decompression backend using libarchive. i was motivated to write it so that i could take advantage of newer archive formats such as tar.zst for my collection which unarr does not currently support.

some advantages:

  • supports .tar.{gz,bz2,xz,zst} archives which in many cases can provide better compression ratios than 7z with significantly faster decompression times.
  • supports 7z,rar5 and lots of other archive types
  • probably works on osx and windows (untested)

some disadvantages:

  • does not support rar4 solid archives
  • libarchivehas a stream-based architecture that does not (currently) offer random access. in practice, this means that you can only seek forward and would have to re-open an archive to read an entry before the current position. this doesn't seem to have a huge performance hit and can be mitigated by creating properly sorted archives.
  • 7z decompression is slow (but seems to be slightly faster than unarr)

as someone who always repacks the items in their library, these benefits outweigh the tradeoff of losing rar4-solid support. i think this additional backend would offer an appealing option for motivated users to consider.

please let me know if this is something that upstream would be interested in considering

bsdf avatar Dec 06 '21 04:12 bsdf

Technically there is nothing wrong with having another decompression backend. We tested libarchive a few years ago when we were searching for an alternative to 7zip, but we could not use it because of the rar4 situation.

While I am not strictly against adding this as a build option, I am a bit concerned about downstream packagers going that route and shipping YACReader without providing proper rar4 support. This would reflect quite negatively on us as most comics are distributed using rar4 and our app would be unusable for them.

The ideal solution for this issue would be to distribute the decompression backends as loadable modules so we are not stuck with a single option at runtime.

selmf avatar Dec 06 '21 05:12 selmf

I am a bit concerned about downstream packagers going that route and shipping YACReader without providing proper rar4 support. This would reflect quite negatively on us as most comics are distributed using rar4 and our app would be unusable for them.

makes sense

The ideal solution for this issue would be to distribute the decompression backends as loadable modules so we are not stuck with a single option at runtime.

i was thinking this as well, but didn't want to step on too many toes with a larger refactor. if i were to take a stab at it, would something using QPluginLoader be the maintainers' preferred approach?

bsdf avatar Dec 07 '21 00:12 bsdf

QPluginLoader would be fine and the possibility to update the backends outside of the release cycle would be much appreciated.

Stepping on toes is not really an issue, but it would be a good idea if you'd get in touch with @luisangelsm directly and maybe get on our Slack (he will set you up) so we can talk about ideas and make sure it does not interfere with any other work we're doing or planning to do.

selmf avatar Dec 12 '21 07:12 selmf

libarchive 3.6.0 may have fixed the rar4 issue.

alastortenebris avatar Apr 14 '22 06:04 alastortenebris

@bsdf do you have time to update this to use libarchive 3.6.x if it actually fixes the rar4 problem?

luisangelsm avatar Jul 06 '22 14:07 luisangelsm

tested against a few .cbrs snatched off the internet and rar4s identified as a.cbr: RAR archive data, v4, os: Win32 work but the v4 solid archive i created, identified as v4-solid-multi.rar: RAR archive data, v4, os: Unix, flags: Solid does not work.

so i'm not sure? do you have a set of test files i could check?

bsdf avatar Jul 07 '22 03:07 bsdf

No, not really, I just saw what @alastortenebris said here:

libarchive 3.6.0 may have fixed the rar4 issue.

luisangelsm avatar Jul 08 '22 10:07 luisangelsm

As far as I understand the situation, libarchive implemented the rar virtual machine needed to process rar4 compression filters, but it is not tested for all available filters and archive configurations. This could be a bit of a problem as the rar vm can be (ab)used for more things than just running rar4 filters, but as long as it is not shipped by us it is not really our issue.

The limitation on non-solid archives is not really an issue either. You will rarely if ever encounter those in the wild. Just make sure you warn about older libarchive versions not supporting rar4.

@luisangelsm: I'm still quite busy, but I'll make some time to review this once it is ready. Please give me a heads-up when you are considering a merge

selmf avatar Jul 08 '22 15:07 selmf

7-zip is now natively linux compatible and supports rar5 for decompression history

wcasanova avatar Jul 10 '22 18:07 wcasanova

7-zip is now natively linux compatible and supports rar5 for decompression history

7-zip also uses non-free code for RAR decompression, so that's not a viable solution for several Linux distros.

alastortenebris avatar Jul 10 '22 19:07 alastortenebris

The main problem with 7-zip is that it is not a library and does not provide a stable API.

selmf avatar Jul 10 '22 19:07 selmf

7-zip is now natively linux compatible and supports rar5 for decompression history

7-zip also uses non-free code for RAR decompression, so that's not a viable solution for several Linux distros.

Wouldn't the same be the case for libarchive ?

RAR and RAR 5.0 archives (with some limitations due to RAR's proprietary status)

7-zip

  1. 7z.dll: - The "GNU LGPL" as main license for most of the code - The "GNU LGPL" with "unRAR license restriction" for some code - The "BSD 3-clause License" for some code 2) All other files: the "GNU LGPL".

unRAR license restriction


The decompression engine for RAR archives was developed using source
code of unRAR program.
All copyrights to original unRAR code are owned by Alexander Roshal.

The license for original unRAR code has the following restriction:

  The unRAR sources cannot be used to re-create the RAR compression algorithm,
  which is proprietary. Distribution of modified unRAR sources in separate form
  or as a part of other software is permitted, provided that it is clearly
  stated in the documentation and source comments that the code may
  not be used to develop a RAR (WinRAR) compatible archiver.

wcasanova avatar Jul 10 '22 19:07 wcasanova

libarchive doesn't use code from unrar. 7-zip does. Regardless, none of this is relevant to the merge request at hand.

alastortenebris avatar Jul 10 '22 19:07 alastortenebris

I understand, the idea was to find the most viable solution for rar, I don't use this format but in Windows it still has quite a few users

wcasanova avatar Jul 10 '22 19:07 wcasanova

Getting rid of p7zip and update 7zip integration to the latest 7zip version is something I definitely need to tackle. I would prefer to use the same decompression backend across all the versions when possible. If there are problems with licenses or distribution, then I am ok falling back to any other backend. Ideally I would like to see that the same file formats are supported for all the backends, currently the lack or RAR5 support in unarr has been problematic, e.g. https://github.com/YACReader/yacreader/issues/307 but this is something that will change once @selmf adds support for it.

Just to be clear, this PR is truly welcome.

luisangelsm avatar Jul 10 '22 19:07 luisangelsm

I may be wrong about this, but if I recall correctly, libarchive does support more formats than 7-zip, most notably zstd support, in addition to being a cross-platform and completely open source library.

Whether or not it is as performant or stable in yacreader as 7-zip and libunarr is a different story.

alastortenebris avatar Jul 10 '22 20:07 alastortenebris

Supporting a lot of formats is not always a good thing. It adds complexion, increases maintenance burden and adds possible attack vectors for security problems.

The main advantage of using libunarr is that it is a lightweight C99 library that focuses on decompression. That means that we can easily test it and add optimizations and features as needed. The main disadvantage is that we have to do it ourselves and I that I am still working on RAR5 support ;)

As for the difference between between 7-zip and libarchive in format support, it should not matter. 7zip, while being generally widespread and well tested, is a pain to work with and can only be used in environments where we control the distribution, aka Windows, MacOS and in case of Linux container packages.

Libarchive is also widespread and has a codebase that is easier to work with than 7zip, but most Linux distributions do not ship the newest version and in my experience it can be quirky at times where libunarr and 7zip are rock solid. YMMV.

So all backends have their advantages and disadvantages. That is not a problem and we are perfectly happy to support different configurations to support our users needs. We already ship with pdfium as default pdf renderer, but also support poppler since that is much easier to use on a Linux machine.

So please don't argue which backend is better and should be shipped by default. They all have their place.

selmf avatar Jul 11 '22 04:07 selmf

@bsdf are you done with the PR?

luisangelsm avatar Jul 22 '22 18:07 luisangelsm

@luisangelsm ready for review. README will be updated when i tackle any PR feedback

bsdf avatar Jul 26 '22 17:07 bsdf

@bsdf do you have time for updating the PR?

luisangelsm avatar Aug 25 '22 17:08 luisangelsm

@luisangelsm apologies, this slipped my mind. i'll make the updates this weekend

bsdf avatar Aug 25 '22 17:08 bsdf

nothing to apologize for, I want to release 9.9.0 after this is merged, thanks again for contributing to the project

luisangelsm avatar Aug 25 '22 20:08 luisangelsm

@selmf updated the build config to add a branch for win32 and macx, but unfortunately i do not have a systems to test them on. please take a look and let me know if they need any changes

bsdf avatar Aug 27 '22 20:08 bsdf

@bsdf I don't see any error messages in Windows.

qmake YACReader.pro -spec win32-msvc "CONFIG+=libarchive"
Project MESSAGE: Using pdfium.
Project MESSAGE: Using 7zip

But it is kind of expected since the build system doesn't support libarchive for windows or macos. This is what we have in config.pri

win32:!CONFIG(unarr):!CONFIG(7zip) {
  CONFIG += 7zip
}

macx:!CONFIG(unarr):!CONFIG(7zip) {
  CONFIG += 7zip
}

So it just falls back to 7zip silently.

After changing it to this:

win32:!CONFIG(unarr):!CONFIG(7zip):!CONFIG(libarchive) {
  CONFIG += 7zip
}

macx:!CONFIG(unarr):!CONFIG(7zip):!CONFIG(libarchive) {
  CONFIG += 7zip
}

I see the error messages:

qmake YACReader.pro -spec win32-msvc "CONFIG+=libarchive"
Project MESSAGE: Using pdfium.
Project ERROR: Unsupported: the libarchive decompression backend is not currently supported on this system.

So I think adding those missing !CONFIG(libarchive) and letting libarchive-wrapper.pri handle the error is probably the right thing to do.

luisangelsm avatar Aug 30 '22 08:08 luisangelsm

Initial tests on Fedora 36 using libarchive 3.5.3 show no real performance impact, even on archives with large amounts of files going forwards and backwards.

I also tried opening a ZStandard (tar.zst) archive and it worked perfectly fine as well.

I don't have a cbr file to test RAR support.

You can find the repo I'm maintaining here.

alastortenebris avatar Aug 31 '22 18:08 alastortenebris

@luisangelsm thanks for getting this PR over the finish line and congrats on the release! major props to the maintainers making it super easy to swap out decompression backends, without which i wouldn't have even bothered :upside_down_face:

bsdf avatar Sep 03 '22 16:09 bsdf

@bsdf Thank you!

luisangelsm avatar Sep 03 '22 16:09 luisangelsm

works with cbr, rar5 and tar.zst

wcasanova avatar Sep 08 '22 13:09 wcasanova