DotNetZip.Semverd icon indicating copy to clipboard operation
DotNetZip.Semverd copied to clipboard

ZipInputStream.GetNextEntry() performance

Open ggrenon opened this issue 7 years ago • 1 comments

I've compared read and write time when zipping ( 0% compression ) data using a ZipOutputStream and using ZipFile.AddFile().

Write times are similar between ZipOutputStream and ZipFile, but I've noticed that the read times are different. I use ZipInputStream to read and both archives contain a single entry.

What I've found is that reading the archive written using the ZipOutputStream is ~33% slower than reading the one written using ZipFile.AddFile().

The time difference comes from ZipInputStream.GetNextEntry(). It executes almost instantly on the Zipile arhive, but takes a long time on the ZipOuputStream archive.

Is this expected behavior? Is there a way to zip data from stream without this read time penalty? Here's my test code in case there's something wrong with it :

// Write
var sw = new Stopatch();
sw.Start();
using ( FileStream fs = new FileStream( /* file to zip path */, FileMode.Open ) )
      using ( FileStream zipFS = new FileStream( /* zip path */, System.IO.FileMode.OpenOrCreate ) )
      using ( ZipOutputStream zipOS = new ZipOutputStream( zipFS ) )
      {
        zipOS.CompressionMethod = CompressionMethod.None;
        zipOS.CompressionLevel = CompressionLevel.Level0;
        zipOS.PutNextEntry( "CycleZip12" );
        fs.CopyTo( zipOS );
      }
sw.Stop();
Console.WriteLine( $"Stream write time : {sw.ElapsedMilliseconds} ms" )
sw.Reset();
sw.Start()
      using ( ZipFile zip = new ZipFile() )
      {
        zip.CompressionMethod = CompressionMethod.None;
        zip.CompressionLevel = CompressionLevel.Level0;
        ZipEntry entry = zip.AddFile(/* test file */);
        zip.Save( /* zip path */ );
      }
sw.Stop();
Console.WriteLine( $"File write time : {sw.ElapsedMilliseconds} ms" )
sw.Reset();
//Read
sw.Start();
using ( FileStream zipFS = new FileStream( /* Path to 1st zip archive */, FileMode.Open ) )
      using ( ZipInputStream zipIS = new ZipInputStream( zipFS ) )
      using ( BinaryReader zipReader = new BinaryReader( zipIS ) )
      {

        ZipEntry entry = zipIS.GetNextEntry();
        for ( int i = 0; i < entry.UncompressedSize / 4; i++ )
          {
            Int16 real = zipReader.ReadInt16();
            Int16 imag = zipReader.ReadInt16();
          }
        }
sw.Stop();
Console.WriteLine( $"Stream read time : {sw.ElapsedMilliseconds} ms" )
sw.Reset();

sw.Start();
using ( FileStream zipFS = new FileStream( /* Path to 2nd zip archive */, FileMode.Open ) )
      using ( ZipInputStream zipIS = new ZipInputStream( zipFS ) )
      using ( BinaryReader zipReader = new BinaryReader( zipIS ) )
      {

        ZipEntry entry = zipIS.GetNextEntry();
        for ( int i = 0; i < entry.UncompressedSize / 4; i++ )
          {
            Int16 real = zipReader.ReadInt16();
            Int16 imag = zipReader.ReadInt16();
          }
        }
sw.Stop();
Console.WriteLine( $"File read time : {sw.ElapsedMilliseconds} ms" )

Thank you.

ggrenon avatar Sep 27 '18 18:09 ggrenon

Thanks for reporting this bug/problem, and sorry about the delay in getting back to you. This is a self-service repository, where I merge PRs and where the merging of PRs causes nugets to be pushed automatically (if you bump the version number in your PR). I'll leave this issue open until someone (or yourself) fixes it.

haf avatar Nov 13 '18 07:11 haf