SharpZipLib icon indicating copy to clipboard operation
SharpZipLib copied to clipboard

SharpZipLib .FastZip.ExtractZip for zip file with 2 or more big data files ~1.5GB (which are not zip) outputs a partial result

Open WaseemK88 opened this issue 3 years ago • 11 comments

Steps to reproduce

1.Create a ZIP with 3 folders. 2.Put in one of the folders 2 big (1.5GB) data files or more (txt, File or any type but not a zip) 3.Extract the Zip using FastZip.ExtractZip method

Expected behavior

Getting the same files as in the zip before extracting it

Actual behavior

Only one of the big files is being extracted, the others do not appear.

Version of SharpZipLib 1.3.1

WaseemK88 avatar Feb 21 '22 05:02 WaseemK88

Is this a single archive, or can you reproduce it with different content?

piksel avatar Feb 22 '22 10:02 piksel

I can reproduce with different content. Try "the steps to reproduce".

WaseemK88 avatar Feb 23 '22 08:02 WaseemK88

How are you creating the file? Using SharpZipLib or something else?

piksel avatar Feb 23 '22 09:02 piksel

Tried to reproduce using these steps:

$ mkdir -p /tmp/006/foo
$ mkdir -p /tmp/006/bar
$ mkdir -p /tmp/006/baz

$ dd if=/dev/urandom of=/tmp/006/bar/file1 bs=1M count=1500
1500+0 records in
1500+0 records out
1572864000 bytes (1.6 GB, 1.5 GiB) copied, 27.5715 s, 57.0 MB/s

$ dd if=/dev/urandom of=/tmp/006/bar/file2 bs=1M count=1500
1500+0 records in
1500+0 records out
1572864000 bytes (1.6 GB, 1.5 GiB) copied, 28.0475 s, 56.1 MB/s

$ cd /tmp/006

$ zip -rv 006.zip foo/ bar/ baz/
  adding: foo/    (in=0) (out=0) (stored 0%)
  adding: bar/    (in=0) (out=0) (stored 0%)
  adding: bar/file1 ......................................................................................................................................................     (in=1572864000) (out=1573118152) (deflated 0%)
  adding: bar/file2 ......................................................................................................................................................     (in=1572864000) (out=1573118278) (deflated 0%)
  adding: baz/    (in=0) (out=0) (stored 0%)
total bytes=3145728000, compressed=3146236430 -> 0% savings

Analyzing that file using ArchiveDiag produces the following report: https://pub.p1k.se/sharpziplib/archivediag/issue729.html

Everything seems to be read fine, and all files are listed.

piksel avatar Feb 23 '22 09:02 piksel

@piksel , I am not seeing the "Extract the Zip using FastZip.ExtractZip method" step. The problem is when I extract the files using FastZip.ExtractZip part of the files are not extracted.

WaseemK88 avatar Feb 23 '22 11:02 WaseemK88

PS> dotnet new console
The template "Console App" was created successfully.

Processing post-creation actions...
Running 'dotnet restore' on C:\wrk\006\tester\tester.csproj...
  Determining projects to restore...
  Restored C:\wrk\006\tester\tester.csproj (in 63 ms).
Restore succeeded.

PS C:\wrk\006\tester> dotnet add package sharpziplib
  Determining projects to restore...
  Writing C:\Users\nilma.CONFIGURA\AppData\Local\Temp\tmpB871.tmp
info : Adding PackageReference for package 'sharpziplib' into project 'C:\wrk\006\tester\tester.csproj'.
info :   GET https://api.nuget.org/v3/registration5-gz-semver2/sharpziplib/index.json
info :   OK https://api.nuget.org/v3/registration5-gz-semver2/sharpziplib/index.json 139ms
info : Restoring packages for C:\wrk\006\tester\tester.csproj...
info : Package 'sharpziplib' is compatible with all the specified frameworks in project 'C:\wrk\006\tester\tester.csproj'.
info : PackageReference for package 'sharpziplib' version '1.3.3' added to file 'C:\wrk\006\tester\tester.csproj'.
info : Writing assets file to disk. Path: C:\wrk\006\tester\obj\project.assets.json
log  : Restored C:\wrk\006\tester\tester.csproj (in 64 ms).

Program.cs:

new ICSharpCode.SharpZipLib.Zip.FastZip().ExtractZip("../006.zip", "output", "");
PS> dotnet run
PS> ls .\output\bar\
    Directory: C:\wrk\006\tester\output\bar

Mode                 LastWriteTime         Length Name
----                 -------------         ------ ----
-a---          2022-02-23    13:32     1572864000 file1
-a---          2022-02-23    13:33     1572864000 file2

piksel avatar Feb 23 '22 12:02 piksel

Thanks for the quick response, @piksel. My tests also were ok when I used FastZip to create and extract the zip file, no matter what the data size or the file structure of it.

But had issues when tried to extract a zip file that was zipped by Windows Compression (By right clicking on the files/folders you want to zip -> Send to -> Compressed (zipped) folder. Of course used FastZip.ExtractZip method. Note: When the files are small, FastZip has no issues to extract them.

Here is the code I used to create the data files:

using System;
using System.Collections.Generic;
using System.IO;
using System.Security.Cryptography;
/// <summary>
/// This will be used by developers "ONLY" in order to play with the code during development.
/// </summary>
namespace DevPlayground
{
    public class Program
    {
        static void Main()
        {
            List<int> fileSizesInMB = new List<int> { 1000, 2000, 1000, 100, 50 };
            string folderPath = $"PATH_IN_YOUR_DRIVE";
            CreateRandomFiles(fileSizesInMB, folderPath);
        }

        private static void CreateRandomFiles(IEnumerable<int> filesSize, string baseFolder)
        {
            Directory.CreateDirectory(baseFolder);
            var filesAndSizeDic = new Dictionary<string, int>();
            foreach (var size in filesSize)
            {
                string fileName = Guid.NewGuid().ToString();
                string filePath = Path.Combine(baseFolder, fileName);
                filesAndSizeDic[filePath] = size;
            }

            CreateRandomFiles(filesAndSizeDic);
        }

        private static void CreateRandomFiles(Dictionary<string,int> filesAndsizeInMb)
        {
            const int blockSize = 1024 * 8;
            const int blocksPerMb = (1024 * 1024) / blockSize;

            byte[] data = new byte[blockSize];
            foreach (var fileAndSizeInMB in filesAndsizeInMb)
            {
                using (RNGCryptoServiceProvider crypto = new RNGCryptoServiceProvider())
                {
                    using (FileStream stream = File.OpenWrite(fileAndSizeInMB.Key))
                    {
                        for (int i = 0; i < fileAndSizeInMB.Value * blocksPerMb; i++)
                        {
                            crypto.GetBytes(data);
                            stream.Write(data, 0, data.Length);
                        }
                    }
                }
            }
        }
    }
}

WaseemK88 avatar Feb 24 '22 10:02 WaseemK88

Windows zip support is awful. I couldn't even create an archive: image

Even if I were able to create one, it might be some other specific thing about the .zip that causes the issue. You can generate a report of your file using the ArchiveDiag tool and put the result in a gist (it will create a <ZIPFILE>.zip.html of the output). That way I can take a look at what the structure of your file, and perhaps determine what the problem is.

piksel avatar Feb 25 '22 10:02 piksel

Thanks @piksel, I ran the tool on a 5GB+3GB zip file that was created using "Windows Compression" feature. See attached output files ArchiveDiagOut.zip .

WaseemK88 avatar Feb 28 '22 11:02 WaseemK88

Yeah, those files use Deflate64 which is a proprietary format that is not supported by SharpZipLib.

...also, what do you mean by "Windows Encryption"? That the files existed on a bitlocker-encrypted NTFS volume? That has no impact on the zip-file, as the files are transparently decrypted on being read.

piksel avatar Feb 28 '22 13:02 piksel

@piksel , thank you! I wanted to say "Windows Compression".

WaseemK88 avatar Mar 01 '22 08:03 WaseemK88