rust-cached-path icon indicating copy to clipboard operation
rust-cached-path copied to clipboard

Determination of archive format

Open eggyal opened this issue 2 years ago • 1 comments

I see that cached-path currently determines how to extract an archive according to its filename extension:

https://github.com/epwalsh/rust-cached-path/blob/db8cafb061ec1ff561747026f5db4317bfbaff7d/src/archives.rs#L17-L23

The problem that I have is that some archives do not use the expected extension format (in my case, gzipped tarballs are using .tgz rather than .tar.gz). While this could be addressed by expanding/customising the extension list used by cached-path, perhaps it's also an opportunity to consider some alternative approaches:

  • HTTP headers (namely Content-Type and Content-Encoding);
  • detection "magic" as per (or via) the file(1) utility (there's also the magic and bindet crates—the former a wrapper around the libmagic C library and the latter not widely used, but both possibly useful here); or
  • a user-provided format specifier?

Personally I feel that HTTP headers would be best (if available: obviously not the case for local resources), perhaps falling-back to magic and/or file extensions if no other option is available.

Happy to submit a PR with whatever approach you feel is most suitable for this library, even if only adding .tgz to existing extension list?

eggyal avatar Dec 28 '22 10:12 eggyal

Hey @eggyal, I would definitely accept a PR for this. I like the idea of using HTTP headers, so I think that should be the first priority. It would also be nice to allow the user to directly specify the format, so if that's straightforward enough to do in the same PR, please go ahead. I'm not opposed to detection "magic" as a fallback as well... that could always be an optional feature of this crate.

epwalsh avatar Jan 03 '23 16:01 epwalsh