eza icon indicating copy to clipboard operation
eza copied to clipboard

Support archive inspection

Open taminob opened this issue 1 year ago • 25 comments

Description

This PR introduces a new flag (currently -q (for --quasi, see AIX's li program) and --inspect-archives) to allow to list the content of archives. If a file is an archive is solely determined based on the file extension for now.

As a proof of concept, I only added support for .tar archives in this PR, but the changes aim to lay the foundation for the future support of additional archive formats.

Resolves #600

How Has This Been Tested?

Lots of manual testing and new tests/cmd tests (in combination with --recurse):

  • invalid tar archive
  • "simple" tar archive (just files)
  • recursive tar archive (directories etc.)
  • "complex tar archive (symlinks in tar archive, directories, ...)

Additionally, I added a tar archive to the ptests tests which tests for --tree.

taminob avatar Jan 24 '24 11:01 taminob

Ref #756 - apparently reading any files is a no-no

daviessm avatar Jan 24 '24 12:01 daviessm

Ref #756 - apparently reading any files is a no-no

Thanks for bringing that up - that would, of course, make this feature impossible to implement. Only compromise I could offer would be that on any error (invalid archive, IO error, etc.), it will fail silently and the file will simply be listed like any regular file. The atime could be manually reset to the previous atime, but not sure if I'm a fan of that solution.

Since I'd propose to hide this feature behind a special flag anyway, we could also say that the atime will not be modified except if a feature flag is given that has to read a file.

@cafkafk @MartinFillon ?

taminob avatar Jan 24 '24 12:01 taminob

I personnally doesn't care that much about that I was just saying back what was said on previous issues. But maybe feature flagging options that read files content (except for git) is a good idea. This is also debatable, do we prefer having it under a feature flag but still in the codebase or having it under a separate crate (still feature flagged). As for the silent fail, is it possible to replicate the broken link case to atleast show the error message on it ?

MartinFillon avatar Jan 24 '24 20:01 MartinFillon

I personnally doesn't care that much about that I was just saying back what was said on previous issues. But maybe feature flagging options that read files content (except for git) is a good idea. This is also debatable, do we prefer having it under a feature flag but still in the codebase or having it under a separate crate (still feature flagged). As for the silent fail, is it possible to replicate the broken link case to atleast show the error message on it ?

Thanks for your thoughts on the topic - since I'm still early in development and don't really have a final design in mind yet, I can't really say much how easy that broken link/error message thing would be, but it sounds like a good idea.

So then I'll go on and we can see later if we want to move the archive stuff into a separate crate.

taminob avatar Jan 26 '24 09:01 taminob

I'm following along...

I just wanted to mention that I think using -q and --quasi are acceptable for the flags, not just because that's what li used, but also because it doesn't clash too terribly with the syntax of lsd (no -q) nor ls (where on GNU and most BSD systems, -q is short for --hide-control-chars).

I also think it would be a useful future addition to attempt identification of archives by contents (i.e. file magic) if a regular file is specified with a trailing slash.

johnsonjh avatar Jan 29 '24 21:01 johnsonjh

I finished a first proposal, if you want to have a look at it - it currently only consists of .tar archive inspection and only uses the file extension to determine its file type. To at least somewhat reduce the complexity of this PR, I'd propose to include detection based on file content etc. in follow-up features - I already kept that in mind and the code should be easily extendable regarding that.

To avoid code duplication, I introduced a common trait for filesystem files (File) and the files in an archive (ArchiveEntry) which I called Filelike for now (better name suggestions are welcome). Unfortunately, this requires to touch a lot of different places in the code base (especially since File had some public members which were used in different places which had to be wrapped into trait functions), but without a lot of code copying and specializing for both File and ArchiveEntry I didn't see any other way to implement this.

I left a couple of TODOs in the code with small remarks, feel free to leave a comment if you have an opinion. Also, tests are still missing and Windows fails to compile - that's still in progress I also already looked at how to support ZIP archives, but would also leave that to a separate PR.

Looking at the current implementation in src/fs/archive.rs I think that splitting it off into a separate crate could actually make sense. Since it will probably be moved anyway, I kept it for now as-is in a single, way too overloaded file. How do you want to proceed for that?

taminob avatar Feb 06 '24 20:02 taminob

Gonna take a look at the code, but this seems promising, I think bringing this is a great advancement.

MartinFillon avatar Feb 07 '24 08:02 MartinFillon

The changes make the Windows build very broken because the refactoring requires modes::from_mode to parse the permissions of a FileLike but libc::S_IRUSR and co. don't exist on Windows. I can foresee trying to display Unix-like permissions of a tar in the Windows long output being problematic too.

Out of scope but on Windows, ZIP inspection would be much more useful than tar.

How useful is uncompressed tar inspection? I thought most tars that people would come across would be compressed in some way.

daviessm avatar Feb 07 '24 08:02 daviessm

It's a good start - compression can be figured out later, I'm sure.

I can foresee inspection of tar, cpio (and RPM as a special case of cpio), ar (.a archive), and zip as being the most important. Eventually it would be nice to see 7z, pax, ISO, CAB, rar, and lha support, and support for decompressing various streaming "outer" compression methods (lzip, bzip2, gzip, compress, etc.)

(I'd normally use libarchive for something like this, if I was working in C.)

johnsonjh avatar Feb 07 '24 09:02 johnsonjh

It seems to not work on windows, could you try to gate it with a #[cfg(any(target_os = "macos", target_os = "linux"))] to make it macos/linux only? That should resolve the CI issues.

Thanks, didn't look at Windows yet at all. Will fix the CI and the other review comments this weekend.

I can foresee trying to display Unix-like permissions of a tar in the Windows long output being problematic too.

I will take a look at the Windows long output and how the permissions are displayed there, but don't really have any kind of Windows development setup to test it in a live environment. Hints on how to solve potential incompatibilities would be appreciated.

How useful is uncompressed tar inspection? I thought most tars that people would come across would be compressed in some way.

It's mainly as a proof-of-concept since the format is pretty easy to deal with. During testing, I also noticed that neovim can actually also open tar archives which I didn't know before.

(I'd normally use libarchive for something like this, if I was working in C.)

Fair point, I actually found https://github.com/fnichol/libarchive-rust as a Rust wrapper, but last commit was 6 years ago and not sure if we want to add that as an dependency.

taminob avatar Feb 07 '24 23:02 taminob

I've tested it on Windows and it works but only seems to read a .tar file if it's specified explicitly, is that intentional? That is, if I specify --quasi --tree or --quasi -R for example, the .tar is shown but the contents are not listed.

daviessm avatar Feb 10 '24 10:02 daviessm

I've tested it on Windows and it works but only seems to read a .tar file if it's specified explicitly, is that intentional? That is, if I specify --quasi --tree or --quasi -R for example, the .tar is shown but the contents are not listed.

Actually, I forgot that and only had explicitly specified tar archives in mind. Do you think that's an actual use case? I can have a look at the code and how to integrate that, but could be that it's not entirely straight-forward (except maybe iterating another time recursively over all files, that would be pretty easy but potentially expensive from a performance perspective).

Edit: I just pushed a draft that archives are also inspected in combination with -R (but does not work with --tree yet).

taminob avatar Feb 10 '24 12:02 taminob

(but does not work with --tree yet)

That is strange cause as of what I know tree is the same as -R for most part

MartinFillon avatar Feb 10 '24 19:02 MartinFillon

(but does not work with --tree yet)

That is strange cause as of what I know tree is the same as -R for most part

You can take a look at https://github.com/eza-community/eza/commit/60bdedd0a4b3996eb6f159d8a43276bbb21d9cee, I added a TODO to the place where --tree is handled. In main.rs there is explicitly an if !tree. To allow --tree with archive inspection probably requires to_archive to be added to the Filelike trait which I tried to avoid to prevent bloating the trait. But shouldn't be too bad, I can do so later.

Or are there any other opinions if the archive inspection should also work for not explicitly specified archives?

taminob avatar Feb 11 '24 11:02 taminob

Okay nice didnt catch that

MartinFillon avatar Feb 11 '24 14:02 MartinFillon

We were chatting about feature gates in the Matrix channel earlier and I think that if we're going to introduce compressed archive inspection in the future it would probably make sense to put that behind a feature gate because it'll pull in compression libraries. Which, by extension, means that this feature should probably be feature gated too so we don't have to either break it when we gate it later, or explain that inspection of some archives is enabled by a feature.

I also think the feature should be enabled by default, though.

daviessm avatar Feb 11 '24 17:02 daviessm

@daviessm I agree regarding the feature flag - do you want to take a look at https://github.com/eza-community/eza/pull/797/commits/1dffecba73030e66f08d22cccb95c487f4ad9875 if you think that's what you meant? At first, I tried to actually remove every part of the code containing archive, but that lead to quite a mess with way too many conditional compilation attributes - thus, I decided to only remove the tar dependency and basically still handle archives, but without any supported format. Since the flag for the git feature also exists without the feature enabled, I did the same here and the flag is just ignored if passed.

Otherwise, I hopefully incorporated the other feedback as well - but feel free to give additional feedback.

The only thing I'm still struggling with is the nix environment that somehow fails to compile eza now due to linking issues in libgit2.so (/usr/lib/libgit2.so: undefined reference to pcre2_get_error_message_8', /usr/lib/libgit2.so: undefined reference to libssh2_session_disconnect_ex', /usr/lib/libgit2.so: undefined reference to SSL_CTX_set_verify@OPENSSL_3.0.0', ...). I don't quite understand which change causes this regression - previously I only had a linking issue with zlib which I was able to resolve (at least locally) by adding it to the dependencies in flake.nix, but wasn't able to fix the current issues - does anyone else have an idea? I guess my addition with zlib is also just a workaround masking the actual issue?

taminob avatar Feb 17 '24 17:02 taminob

@daviessm I agree regarding the feature flag - do you want to take a look at https://github.com/eza-community/eza/commit/1dffecba73030e66f08d22cccb95c487f4ad9875 if you think that's what you meant?

Yeah that makes sense, the only other thing is I think we should remove the help text too but I don't think we do that for git either and it'll be another thing to do after clap gets merged.

daviessm avatar Feb 17 '24 19:02 daviessm

Yeah that makes sense, the only other thing is I think we should remove the help text too but I don't think we do that for git either and it'll be another thing to do after clap gets merged.

Fair point, looks like I forgot to add the flags to the help text altogether. I added it to the display options in https://github.com/eza-community/eza/pull/797/commits/50d6221fff83e99085754f2b90332b959efc65f5.

taminob avatar Feb 17 '24 20:02 taminob

@cafkafk @MartinFillon ping, not entirely sure who else to ping. I rebased the PR so that we can maybe get it ready for merge - but feel free to nitpick as much as you want.

Some help with the CI would be appreciated, but I don't think any change in this PR broke the BSD workflows.

taminob avatar Mar 28 '24 13:03 taminob