sharpcompress icon indicating copy to clipboard operation
sharpcompress copied to clipboard

Add hints to random access availability to archives

Open vpenades opened this issue 2 years ago • 11 comments

Following #707

I've noticed you really don't know if you need to read sequentially or you can access randomly until you have opened the archive... correct me if I'm wrong:

Plain .TAR files can actually be read randomly.... but tar archives inside a gzip archive can't

So, if that's the case, the SupportsRandomAccess property should go into the IArchive, not in the IArchiveFactory interface, right?

Also, wouldn't the IsSolid property would be enough for this? (in that case, IsSolid should be true for .tar.gz files, I've checked and currently it's false)

If IsSolid serves a different purpose, then, IArchive definitely needs a SupportsRandomAccess

Any thoughts?

vpenades avatar Nov 28 '22 08:11 vpenades

IsSolid means a specific thing for RAR.

On Streams there's IsSeekable which basically means what we want but we need it on IArchive. An alternative is just not to have IArchive interfaces for non-seekable situations and tell people to use IReader

adamhathcock avatar Nov 28 '22 08:11 adamhathcock

Got it... I'll do a new PR with that knowledge

vpenades avatar Nov 28 '22 08:11 vpenades

Hmm.. having a hard time exposing the IsSeekable to IArchive .... opening a tar.gz reports IsSeekable to true in all the streams I can see around.

Anyway, the problem seems to be more tricky to handle;

On one side, most archives, including plain TAR can be opened as IArchive and accessed randomly.

TAR.XX is a special case and needs to be opened using an IReader.... because if you try to go through the IArchive path, it's just a Gzip with a single entry.

To understand the problem, what I am trying to do is a general archive reader, that relies on IArchiveFactory and does not know about the specifics of each archive format, and, whenever possible, try to use the random access, and only when that's not possible, to fall back to IReader.... but I would need a way to know which archives support and not support it.

The alternative is to also provide public static IReader Open(Stream stream, ReaderOptions? options = null) to IArchiveFactory

vpenades avatar Nov 28 '22 09:11 vpenades

IReaderFactory has that Open I think.

You don't want to put stream's IsSeekable on IArchive. You want to return true/false based on the archive/compression format. File streams are always seekable but decompressing files is usually not. Zip/Rar has individual files compressed so can seek. TarGz/TarBz are one continuous compression so they're not seekable.

adamhathcock avatar Nov 28 '22 10:11 adamhathcock

yes, it's ReaderFactory the one that has that Open... so I think an IReaderFactory interface is needed, in the same way that I introduced IArchiveFactory... so the spaguetty code can be removed and new readers can be registered.

If I follow that path, what I would like to avoid is having to register factories at both Archive* and Reader* so it could be goo to have a single factory class for each archive type, implementing both IArchiveFactory and IReaderFactory

Maybe a Factory folder, and moving all ZipArchiveFactory to it, renaming ot ZipFactory, etc?

vpenades avatar Nov 28 '22 10:11 vpenades

that's not bad idea to just have singular factory classes or something to consolidate things

adamhathcock avatar Nov 28 '22 15:11 adamhathcock

So, SevenZip doesn't have an implementation in the readers factory? is there a reason for it?

vpenades avatar Nov 29 '22 08:11 vpenades

Added a PR: #709

vpenades avatar Nov 30 '22 09:11 vpenades

So, SevenZip doesn't have an implementation in the readers factory? is there a reason for it?

This is because 7Zip requires random access to a file from my memory. The streams need to seek around to properly find headers and decompress the streams in the format. Readers only work for non-seekable streams.

adamhathcock avatar Dec 05 '22 15:12 adamhathcock

@adamhathcock expanding this topic a bit further: which would be the recomended way to open archives in a generic way?

What I'm trying to achieve is to traverse a number of directories, containing all sorts of archives (zip,rar, 7z, etc) open them and scan their content.

vpenades avatar Feb 13 '23 14:02 vpenades

You'll have to implement that yourself if you can't guarantee Reader. It's beyond the scope of the library.

I've been away for personal reasons.

adamhathcock avatar Mar 01 '23 08:03 adamhathcock