DEFLATE-library-Java icon indicating copy to clipboard operation
DEFLATE-library-Java copied to clipboard

Suggestion: multi-member GZIP and unmarkable streams

Open ajohnson1 opened this issue 3 years ago • 2 comments

The GZIP RFC 1952 allows multiple members, where each member has a header, a deflated section and a trailer.

It's possible to handle these using DEFLATE-library-Java by using a markable underlying stream, reading the header from the underlying stream, using the inflater, using detach to reset underlying stream, read the trailer from the underlying stream, read the next header if present from the underlying stream, then creating a new deflater. This works, and normally it is possible to make a stream markable by wrapping it in a BufferedStream.

Sometimes this isn't so easy. For example, Eclipse Memory Analyzer creates a random access view of a GZIP file by having multiple GZIPInputStreams all based on the same underlying RandomAccessFile, and switching the underlying stream to the correct position for a GZIPInputStream before using that stream. This goes wrong with multiple members as it isn't so easy to switch the mark positions as well, though it could be done.

An idea I had was to have a different detach mode where after the end of a member (and a -1 return) the caller could detach the inflater and start reading from the inflater the unprocessed data from the input buffer and then the underlying stream. Once the caller had read new header then the inflater could be restarted on a new member with an attach() call. This could be hidden inside a GZIPInputStream so the caller of that never need to be aware of the multiple members.

I've made the change to a private version of InflaterInputStream used by Memory Analyzer but perhaps people have some better ideas as to how this could be accomplished.

ajohnson1 avatar Apr 19 '21 19:04 ajohnson1