sevenzip icon indicating copy to clipboard operation
sevenzip copied to clipboard

Add methods to get compressed size and offset for each stream (folder)

Open orisano opened this issue 11 months ago • 1 comments

First of all, thank you for creating and maintaining this excellent Go library for reading 7z files. It has been very helpful for our project.

Background

I'm working on implementing parallel access to 7z files stored in object storage. To achieve this efficiently, I need to be able to:

  1. Know the compressed size of each stream to make ranged GET requests
  2. Know the offset of each compressed stream to access specific parts

Feature Request

Please add two new methods to the Reader interface:

  1. A method to get the compressed size for each stream
  2. A method to get the offset position of each compressed stream in the archive

Use Case

This would enable efficient parallel processing of 7z files stored in object storage by:

  • Making precise ranged GET requests for specific streams
  • Allowing multiple workers to process different streams concurrently
  • Minimizing unnecessary data transfer

Technical Details

The requested methods could look something like:

// Returns the compressed size of the specified stream
func (r *Reader) GetStreamCompressedSize(streamIndex int) (uint64, error)

// Returns the offset of the specified stream in the archive
func (r *Reader) GetStreamOffset(streamIndex int) (uint64, error)

I'm happy to create a Pull Request for this feature if you think it would be helpful.

orisano avatar Jan 22 '25 20:01 orisano

I'm trying to understand how you'd make use of this. Is this so you would just fetch the raw streams and read them separately? How do you know what compression algorithm(s) are used? Surely you need that information as well?

bodgit avatar Feb 14 '25 10:02 bodgit

I've just closed #383 which seems like a similar request.

For this to work universally, (i.e. any file using any algorithm), I would have to expose the compression algorithms as well. You would then need to implement each algorithm or I would have to export all of my algorithm packages. In the case of BCJ2 I would then need to expose the various bind pairs and pretty much every other internal of the library just for you to be able to reimplement it after you've fetched sections of the files over HTTP.

bodgit avatar Sep 08 '25 10:09 bodgit