SharpZipLib icon indicating copy to clipboard operation
SharpZipLib copied to clipboard

InflaterInputStream doesn't reset position of underlying stream to end of deflated data

Open jed-unity3d opened this issue 4 years ago • 2 comments

Steps to reproduce

  1. Obtain a file that contains a deflated blob with some other data following it. In my case, I have a binary file where there is a deflated blob, followed by some additional ints/floats/etc. It's important that there is trailing data after the deflated blob. It doesn't matter what the data is, it could be garbage. It just has to exist. The code below assumes there is at least 4 bytes in this trailing data.
  2. Run this simple program against it, after filling in appropriate values for InFile, InFileCompressedDataStartPos, InFileCompressedLength, and InFileUncompressedLength:
    internal class Program
    {
        private static string InFile = @"";

        private static int InFileCompressedDataStart = 0;
        private static int InFileCompressedLength = 0;
        private static int InFileUncompressedLength = 0;
        
        public static void Main(string[] args)
        {
            using (var stream = File.OpenRead(InFile))
            {
                stream.Position = InFileCompressedDataStart;

                //Encounter deflated blob in stream, and inflate it
                using (var zStream = new InflaterInputStream(stream))
                {
                    zStream.IsStreamOwner = false;
                    var buffer = new byte[InFileUncompressedLength];

                    var streamStartPos = stream.Position;
                    Console.WriteLine($"Stream position before inflate: {streamStartPos}");

                    var read = zStream.Read(buffer, 0, InFileCompressedLength);
                    Console.WriteLine($"InflaterInputStream read {read} bytes");

                    var streamEndPos = stream.Position;
                    Console.WriteLine($"Stream position after inflate: {streamEndPos}");
                    Console.WriteLine($"Stream position moved {streamEndPos - streamStartPos} bytes during read");
                }

                //Continue on with data following deflated blob
                var testBuffer = new byte[4];
                stream.Read(testBuffer, 0, 4);    //This reads from the wrong offset in the file!
            }
        }
    }
  1. Note the console output

Expected behavior

stream.Position should have progressed by InFileCompressedLength. (to allow the trailing data to be read independently)

Actual behavior

stream.Position progressed by some amount greater than InFileCompressedLength. (this is part way thru completely unrelated data!)

Version of SharpZipLib

1.3.1

Obtained from (only keep the relevant lines)

  • Package installed using NuGet

jed-unity3d avatar Feb 22 '21 19:02 jed-unity3d

This is probably due to the internal buffering. It might be possible to just seek to the end of the deflated data here if the underlying stream supports it:

https://github.com/icsharpcode/SharpZipLib/blob/06ff713469fd6e1c1cdd2ad3b364248e457a1b96/src/ICSharpCode.SharpZipLib/Zip/Compression/Streams/InflaterInputStream.cs#L665-L668

It might also be possible to do this:

using (var zStream = new InflaterInputStream(stream), new Inflater(), 1)
{
    // ...
}

Which will limit the buffer to a single byte. This means that it will check if it's needed before reading every single byte and only advance the underlying stream if necessary. It will probably be really slow, but as long as the data is byte-aligned it should work afaik.

piksel avatar Feb 23 '21 12:02 piksel

Another possibility would be to add in SubStream. A sample implementation is written here: https://stackoverflow.com/questions/6949441/how-to-expose-a-sub-section-of-my-stream-to-a-user

So basically, the underlying Stream is first wrapped in a SubStream instance which has it's length set to the value you already have. This SubStream is then given to the constructor of InflateInputStream. SubStream is basically limiting how much can be read from the underlying stream before signalling end of stream by itself.

Your code above would change to something like this:

internal class Program
{
    private static string InFile = @"";

    private static int InFileCompressedDataStart = 0;
    private static int InFileCompressedLength = 0;
    private static int InFileUncompressedLength = 0;
    
    public static void Main(string[] args)
    {
        using (var stream = File.OpenRead(InFile))
        {
            stream.Position = InFileCompressedDataStart;

            //Encounter deflated blob in stream, and inflate it
            using (var subStream = new SubStream(stream, InFileCompressedDataStart, InFileCompressedLength)
            {
                using (var zStream = new InflaterInputStream(subStream))
                {
                    zStream.IsStreamOwner = false;
                    var buffer = new byte[InFileUncompressedLength];

                    var streamStartPos = stream.Position;
                    Console.WriteLine($"Stream position before inflate: {streamStartPos}");

                    var read = zStream.Read(buffer, 0, InFileUncompressedLength);
                    Console.WriteLine($"InflaterInputStream read {read} bytes");

                    var streamEndPos = stream.Position;
                    Console.WriteLine($"Stream position after inflate: {streamEndPos}");
                    Console.WriteLine($"Stream position moved {streamEndPos - streamStartPos} bytes during read");
                }
            }

            //Continue on with data following deflated blob
            var testBuffer = new byte[4];
            stream.Read(testBuffer, 0, 4);    //This reads from the wrong offset in the file!
        }
    }
}

A simpler option, as you have all values up front. Before continue reading, compare calculated position and actual position in the stream and if they don't match, do a stream.Seek or stream.Position before continue reading. If the underlying stream is in fact non-seekable, using something like SubStream is the only valuable Option without sacrificing the performance extremely.

hjred avatar Mar 25 '21 16:03 hjred