go7z icon indicating copy to clipboard operation
go7z copied to clipboard

Is there a reason for use `archive/tar` like API instead of `archive/zip`?

Open ajnavarro opened this issue 4 years ago • 2 comments

First of all, congratulations and thanks for this library.

As far as I know, 7zip uses an index, like zip files.

Going through the code, we can see that we are loading all entries when we initialize the 7zip reader, so when we call Next(), we are just iterating through a slice to get the FileInfo:

https://github.com/saracen/go7z/blob/9c09b6bd7fda869ef48ff6f693744a65f477816b/reader.go#L188-L197

My question is, why do we need an iterator-like API if we know the entries beforehand? Maybe an archive/zip API would be better for this use case?:

zr, _ := go7z.NewReader(readerAt, size)
for _, f := range zr.Files {
    info := f.FileInfo()
    name := f.Name
    reader, _ := f.Open()
    ...
}

Sorry in advance if I missed some obvious problem here that makes this impossible. If you think it's a good idea, I'll be happy to help with the implementation.

ajnavarro avatar Jun 09 '20 09:06 ajnavarro

Hey,

I wrote this some time ago, so my memory around it is a little fuzzy. I still find the 7z archive format confusing.

I think the original rationale for the tar like interface was because my main use-case was for full archive extraction. Although the FileInfo is accessible, the data content is typically written to a compressed solid block (a folder in 7z terminology), compressed alongside other files. So random access to a file's content isn't as easy as zip. If you try to extract a specific file, and it happens to be at the end of a solid block, the whole block needs decompressing. The zip interface also allows you to Open() multiple files concurrently. With 7z, if you were to open several files within the same solid block, making sure you decompress them efficiently might be difficult.

Having said that, the current interface is somewhat broken. You're supposed to be able to "skip" a file (either jumping to the next solid block, or seek within the current solid block and decompress/discarding previous data), but for some archives this doesn't work.

A fresh pair of eyes on the code, interface and that bug would be great if you're interested in helping out!

saracen avatar Jun 09 '20 23:06 saracen

Thanks a lot for the explanation.

Just to be more familiar with the codebase I tried to fix the skip file problem (just a workaround, to be able to iterate from there with some specific tests).

From here, I can have a look at the 7zip folder format and check how other libraries are handling skipping files that are into 7zip folders.

Again, thanks a lot for your time!

ajnavarro avatar Jun 10 '20 11:06 ajnavarro