SharpZipLib icon indicating copy to clipboard operation
SharpZipLib copied to clipboard

BZip2InputStream.Length reports an incorrect length

Open ghost opened this issue 4 years ago • 3 comments

Steps to reproduce

  1. Load already compressed bzip2 payload into a MemoryStream
  2. Create BZip2InputStream from said MemoryStream
  3. Assert.AreNotEqual(memorystream.Length, bzip2stream.Length);

Example code is included below.

Expected behavior

BZip2InputStream.Length should return how many bytes the developer can expect to read from the stream.

Actual behavior

BZip2InputStream.Length returns 48, but only 11 bytes are actually available in the stream. Uncompressed data is a string of 11 bytes, compressed data blob is 48 bytes. It seems to just return the length reported by underlying stream.

Version of SharpZipLib

Version 1.3.1

Obtained from

  • Package installed using NuGet

Example code

using System.IO;
using System.Threading.Tasks;
using ICSharpCode.SharpZipLib.BZip2;
using NUnit.Framework;

namespace Tests {
    [TestFixture]
    public class StreamLengthTest {
        private static readonly byte[] CompressedChunk = {
            0x42, 0x5a, 0x68, 0x39, 0x31, 0x41, 0x59, 0x26, 0x53, 0x59, 0xd5, 0xb7, 0x4b, 0xf5, 0x00, 0x00,
            0x03, 0xc1, 0x80, 0x00, 0x10, 0x23, 0x84, 0x44, 0x00, 0x20, 0x00, 0x31, 0x0c, 0x08, 0x20, 0xd1,
            0xb4, 0x8b, 0xa4, 0x28, 0x05, 0x3c, 0x5d, 0xc9, 0x14, 0xe1, 0x42, 0x43, 0x56, 0xdd, 0x2f, 0xd4
        };

        [Test]
        public async Task Test() {
            // Read the bzip2 blob into a memory stream
            await using var memorystream = new MemoryStream(CompressedChunk);
            // Create bzip2 stream from memorystream
            await using var bzip2stream = new BZip2InputStream(memorystream);

            // Sanity checks
            Assert.AreEqual(48, CompressedChunk.Length, "The constant binary blob should have length 48");
            Assert.AreEqual(CompressedChunk.Length, memorystream.Length, "memorystream should be a 1:1 representation of CompressedChunk. Lengths should match.");
            Assert.IsTrue(memorystream.CanRead);
            Assert.IsTrue(bzip2stream.CanRead);

            // Read data from the stream
            var expectedLength = bzip2stream.Length;
            var actualLength = 0;
            byte[] buffer = {0};
            try {
                while (0 < bzip2stream.Read(buffer, 0, 1)) {
                    actualLength++;
                }
            } catch { /* ignored */ }

            // Observe results
            Assert.AreEqual(expectedLength, actualLength, "Actual length should equal reported length");
        }
    }
}

ghost avatar Apr 20 '21 09:04 ghost

Exerpt from src/ICSharpCode.SharpZipLib/BZip2/BZip2InputStream.cs line 158 to 167:

/// <summary>
/// Gets the length in bytes of the stream.
/// </summary>
public override long Length
{
	get
	{
		return baseStream.Length;
	}
}

ghost avatar Apr 20 '21 09:04 ghost

You cannot get the length without decompressing the stream first. The only alternative would be to throw a NotSupportedException instead, which is probably more correct (and is what System.IO.Compression.GZipStream does). The problem is that it might break code that rely on this old behavior.

We should add a deprecation warning (ObsoleteAttribute) on it though.

piksel avatar Apr 20 '21 11:04 piksel

Since uncompressed size is not part of the bzip2 header, I see that this is not possible to implement properly. I would prefer almost anything to silent failure though, since this could cause (and demonstrably has caused) wasted developer hours. Marking it obsolete would be a good first step.

ghost avatar Apr 20 '21 11:04 ghost