p7zip icon indicating copy to clipboard operation
p7zip copied to clipboard

Zip extraction issues

Open ghost opened this issue 4 years ago • 33 comments

Reported by @tonurics on Arch Linux bugtracker (FS#69253).

To clarify, assume you have zip file called "foo.zip" containing one item: "readme.txt" Taking the same file above and instead using switch e [to extract to the working directory, i.e 7z e foo.zip], one of two things will happen [depending on the zip file]: sometimes 7z creates a file named "IBM437.so" with the contents of "readme.txt".

Seeing IBM437 makes me think of c104127...

Note: Arch Linux already switched to your p7zip fork and is providing 17.03 in the official repository.

ghost avatar Jan 09 '21 13:01 ghost

I've added some additional comments to the ticket on the Arch Linux bugtracker, along with a zip file you can use to reproduce the issues.

tonurics avatar Jan 09 '21 16:01 tonurics

I've updated the Arch Linux bugtracker with test results from releases and builds created from commits.

I skipped c3c8629b3b6d824c13d62efa5efda9e9139363a8 as it was marked as failing. Having said that https://github.com/jinfeihan57/p7zip/commit/c104127e6a9364b8d6a1d79012e5249a129c3857 is the first commit where the issue appears; which is also the one @jloqfjgk hinted at above.

tonurics avatar Jan 10 '21 02:01 tonurics

Is this issue Arch-specific or it is reproduced on other distros using p7zip built from the same source tree?

PS: Attaching sample file from Arch issue:

I found a small zip file that can be used to reproduce the issue in 17.03-1. Both the x and e switch options will result in the issues I described above. In the case of the e switch: the "IBM437.so" file will appear, and not the error trying to replace the working directory. ComicInfo.zip

unxed avatar Jan 10 '21 08:01 unxed

UPD: Built jinfeihan57/p7zip master on my Mint 20, issue is not reproducing on attached sample. Should I do it on Arch instead? What exact distro version should I use?

unxed avatar Jan 10 '21 09:01 unxed

I tested the zip file attached above with the three 7z release binaries posted at: https://github.com/jinfeihan57/p7zip/releases

p7zip 17.01 = No Issue p7zip 17.02 = No Issue p7zip 17.03 = Broken

Tested that 17.03 binary on Mint 20, works fine on attached sample. Need to know exact Arch distro version to test on it.

unxed avatar Jan 10 '21 09:01 unxed

Also, Arch Linux is rolling release so there is no version number.

Thanks! Will try to reproduce

unxed avatar Jan 10 '21 09:01 unxed

If I pass env LANG=C then the bug can be reproduced.

Still not reproducting on my Mint even with this trick. Do you use sample attached above, ComicInfo.zip?

Btw, can anyone please help with QUICK way to install Arch in VirtualBox? Tried zen installer, it completed, but system stays unbootable. Maybe anyone can provide ready to run virtualbox image?

unxed avatar Jan 10 '21 11:01 unxed

Try disabling UEFI in VirtualBox

Already disabled.

follow the official installation guide

It looks like it should take an hour or more. Can't afford it, alas. Isn't there a faster way?

unxed avatar Jan 10 '21 12:01 unxed

Will try preinstalled version from here https://www.osboxes.org/arch-linux/

unxed avatar Jan 10 '21 12:01 unxed

Will try preinstalled version from here https://www.osboxes.org/arch-linux/ Remember to do pacman -Syu so that all packages will be updated to the latest version.

Damn, it refuses to work also

VirtualBox_arch_10_01_2021_16_14_57

unxed avatar Jan 10 '21 13:01 unxed

Had to do the following:

  1. from https://bbs.archlinux.org/viewtopic.php?pid=1653252#p1653252 edit /etc/pacman.d/gnupg/gpg.conf and change the keyserver line to: keyserver hkp://keyserver.ubuntu.com

  2. from https://wiki.archlinux.org/index.php/Pacman/Package_signing#Cannot_import_keys sudo pacman-key --init sudo pacman-key --populate sudo pacman-key --refresh-keys sudo pacman -Sy archlinux-keyring

sudo pacman -Syu

For full upgrade to complete. That Arch is a bit tricky thing :)

unxed avatar Jan 10 '21 15:01 unxed

Trying to unzip attached sample looks a bit buggy (is askes to overwrite non-existent file), but extraction itself works ok no .so file is created. VirtualBox_arch_10_01_2021_18_45_20 VirtualBox_arch_10_01_2021_18_48_12

unxed avatar Jan 10 '21 15:01 unxed

Maybe I'm doing something wrong or need some additional steps to reproduce IBM437.so creation? Exact image I used: https://nav.dl.sourceforge.net/project/osboxes/v/vb/4-Ar---c-x/2019.05/KDE/201905-kde-64bit.7z

unxed avatar Jan 10 '21 15:01 unxed

For completeness, here are my localization settings: /etc/locale.gen en_US.UTF-8 UTF-8 en_US ISO-8859-1 [If you change this: run locale-gen and restart.]

/etc/locale.conf LANG=en_US.UTF-8 [If you change this: restart.]

@unxed You appear to only be using the x [eXtract files with full paths] command; which creates superfluous directories, where instead, it should be should be creating files.

The e [Extract files from archive (without using directory names)] command on the other hand does one of two things [depending on the zip archive]. It will either: try to replace the working directory [and fail] or create a file named "IBM437.so" with the contents of one the the file entries in the zip archive.

You can read my ticket on the Arch Linux bugtracker here: https://bugs.archlinux.org/task/69253?project=1&string=p7zip

tonurics avatar Jan 10 '21 17:01 tonurics

As for your Arch installation, I'm not sure you have a fully working system. If you continue to run into problems reproducing the issue: *) Revert the changes you made to: /etc/pacman.d/gnupg/gpg.conf *) Re-initialize the pacman keyring: pacman-key --init *) Fetch the archlinux signing keys: pacman-key --populate archlinux *) Clear out any junk: pacman-key --refresh-keys *) Force update of repos and packages [note "yy"]: pacman -Syyu

As Arch is a rolling release, if you have an image that hasn't been updated for some time: it can get stuck where new packages can't be updated and the best option is to install a fresh copy [which is generally pretty quick, if you follow along with the Installation guide; you might even have just been able to use the install media as a sort of CLI only live CD]. Having said all that, things for taking time to get Arch running for testing!

tonurics avatar Jan 10 '21 17:01 tonurics

I can confirm this is happening on Arch with 17.03

Arch doesn't do anything strange to build it though: make OPTFLAGS="$CPPFLAGS $CFLAGS" 7z 7zr 7za See the PKGBUILD here: https://github.com/archlinux/svntogit-packages/blob/packages/p7zip/trunk/PKGBUILD

txtsd avatar Jan 10 '21 18:01 txtsd

At the end of the new code added in c104127e6a9364b8d6a1d79012e5249a129c3857, res seems to contain:

ComicInfo.xml(NUL)(NUL)(NUL): gconv_end(NUL)/usr/lib/gconv/IBM437.so

So the IBM437.so filename originates from garbage at the end of the output string. Perhaps s_utf8 can be trimmed to the appropriate length thus avoiding ConvertUTF8ToUnicode carrying over the unwanted characters to the output string?

The following approach correctly extracts ComicInfo.xml but my understanding of the code is very limited so it could be going about it the wrong way. (Also removed the + 1 from the buffer allocation because the *_SetEnd methods already handle null termination from what I gathered.)

diff --git a/CPP/7zip/Archive/Zip/ZipItem.cpp b/CPP/7zip/Archive/Zip/ZipItem.cpp
index 353e895..4d48296 100644
--- a/CPP/7zip/Archive/Zip/ZipItem.cpp
+++ b/CPP/7zip/Archive/Zip/ZipItem.cpp
@@ -423,10 +423,10 @@ void CItem::GetUnicodeString(UString &res, const AString &s, bool isComment, boo
       const char* src = s.Ptr();
       size_t slen = s.Len();
       size_t dlen = slen * 4;
-      const char* dest = s_utf8.GetBuf_SetEnd(dlen + 1); // (source length * 4) + null termination
+      const char* dest = s_utf8.GetBuf_SetEnd(dlen); // (source length * 4)
 
       size_t done = iconv(cd, (char**)&src, &slen, (char**)&dest, &dlen);
-      bzero((size_t*)dest + done, 1);
+      s_utf8.ReleaseBuf_SetEnd(s_utf8.Len() - dlen);
 
       iconv_close(cd);
 

foutrelis avatar Jan 10 '21 23:01 foutrelis

I don't work on arch linux. Anybody solve this issue? OR does arch linux have a compile macro(like APPLE),so we can skipe the code.

jinfeihan57 avatar Jan 15 '21 07:01 jinfeihan57

For me 17.03 testing outputs $ 7z x -y ComicInfo.zip => ComicInfo.xml but $ 7z e ComicInfo.zip => IBM437.s

$ file ComicInfo.xml ComicInfo.xml: XML 1.0 document, ASCII text and $ file IBM437.s IBM437.s: XML 1.0 document, ASCII text

biopsin avatar Jan 15 '21 08:01 biopsin

@biopsin I assume you are seeing still seeing the erroneous folder creation problem with the x [eXtract files with full paths] command, even though you didn't give it a new output path; i.e. 7z x -o/tmp/1 foo.zip. Your use of -y [Assume Yes on all queries] would have 7z silently replace the folders with files.

I would be cautious of using the -y switch with e [Extract files from archive (without using directory names)] command, if a zip archive has multiple file entries: they will all want to overwrite each other into the one "IBM437.so" file.

tonurics avatar Jan 15 '21 08:01 tonurics

Yes I see it; extracting with 7z x $1 but not with 7z e $1

biopsin avatar Jan 15 '21 10:01 biopsin

OR does arch linux have a compile macro(like APPLE),so we can skipe the code.

I wouldn't consider this to be an Arch Linux issue. More likely this is related to newer libstdc++ or some other core library.

For what it's worth, this is also reproducible on Fedora 32 and Fedora 33 (7z e ComicInfo.zip creates IBM437.s). I would conclude that it happens with GCC 10's libstdc++ but Ubuntu 20.10 appears to be unaffected.

@unxed: Could you test on Fedora 33 (or Fedora 32) which works as a live disc and doesn't require installation/configuration?

foutrelis avatar Jan 15 '21 11:01 foutrelis

I just tested the following distros livecds in VirtualBox [except Arch] using default settings and the compiled p7zip 17.03 release:

Distro Based on Result e Filename glibc Version
Sparky Linux Debian Testing Pass N/A 2.31
Arch Linux Independent Fail IBM437.so 2.32
Clear Linux Independent Fail IBM437.s 2.31
PCLinuxOS Independent Fail IBM437.s 2.31
Solus Independent Fail IBM437.s 2.29

Note: Seeing that Sparky Linux passed [which is Debian Testing], I would assume: that all other Debian distros based would likely pass. So stuck to testing independent upstream source distros with recent releases.

tonurics avatar Jan 15 '21 21:01 tonurics

It doesn't really matter why Debian doesn't repro this exact bug. Increasing the length/size of s_utf8 to something like 1024 bytes results in incorrect filename on Debian 10:

root@debian:~# ls
'ComicInfo>'$'\n''PK'$'\001\002''?'   ComicInfo.zip   p7zip

And on second run:

root@debian:~# ./p7zip/bin/7z e ComicInfo.zip
...
Would you like to replace the existing file:
  Path:     ./ComicInfo>
PK?
  Size:     590 bytes (1 KiB)
  Modified: 2001-01-01 09:01:00
with the file from archive:
  Path:     ComicInfo.xml
...

Either a ReleaseBuf_SetEnd call is needed to trim s_utf8 (see https://github.com/jinfeihan57/p7zip/issues/112#issuecomment-757566333) or other code needs to be adjusted to ignore the random data in the string buffer after the first NUL character.

foutrelis avatar Jan 16 '21 09:01 foutrelis

Can this bug be reproduced on musl-based distributions, such as Alpine Linux?

@jloqfjgk [All your comments appear to have been deleted.] For edification: I was actually curious about that myself and did try to test it. But the 17.03 release binary didn't run. And not knowing enough about Alpine/musl: didn't want to try compiling it myself and create any red herrings or false feedback.

tonurics avatar Jan 16 '21 10:01 tonurics

What is the status of this issue? Do you want me to create a Dockerfile to reproduce this issue?

buzztaiki avatar Apr 17 '21 13:04 buzztaiki

On Ubuntu 20.04 I have this issue with Zip extraction.

$ wget -q https://github.com/jinfeihan57/p7zip/archive/refs/heads/master.zip
$ ls -la
итого 7352
drwxrwxr-x  2 pavel pavel      60 мая 28 20:15 .
drwxrwxrwt 27 root  root      800 мая 28 20:09 ..
-rw-rw-r--  1 pavel pavel 7528082 мая 28 19:52 master.zip
$ 7z x master.zip 

7-Zip [64] 17.04 : Copyright (c) 1999-2021 Igor Pavlov : 2017-08-28
p7zip Version 17.04 (locale=ru_RU.UTF-8,Utf16=on,HugeFiles=on,64 bits,4 CPUs Intel(R) Xeon(R) CPU           L5420  @ 2.50GHz (1067A),ASM)

Scanning the drive for archives:
1 file, 7528082 bytes (7352 KiB)

Extracting archive: master.zip
--
Path = master.zip
Type = zip
Physical Size = 7528082
Comment = eb1bbb0d0327a103850fec519015986e72a1ebf0

    
Would you like to replace the existing file:
  Path:     ./p7zip-master/.github/workflows/macos-build.yml
  Size:     0 bytes
  Modified: 2021-05-28 19:52:47
with the file from archive:
  Path:     p7zip-master/.github/workflows/macos-build.yml
  Size:     3024 bytes (3 KiB)
  Modified: 2021-05-17 13:21:36
? (Y)es / (N)o / (A)lways / (S)kip all / A(u)to rename all / (Q)uit? q

Archives with Errors: 1



Break signaled
$ 7za x master.zip 

7-Zip (a) [64] 17.04 : Copyright (c) 1999-2021 Igor Pavlov : 2017-08-28
p7zip Version 17.04 (locale=ru_RU.UTF-8,Utf16=on,HugeFiles=on,64 bits,4 CPUs Intel(R) Xeon(R) CPU           L5420  @ 2.50GHz (1067A),ASM)

Scanning the drive for archives:
1 file, 7528082 bytes (7352 KiB)

Extracting archive: master.zip
--
Path = master.zip
Type = zip
Physical Size = 7528082
Comment = eb1bbb0d0327a103850fec519015986e72a1ebf0

    
Would you like to replace the existing file:
  Path:     ./p7zip-master/.github/workflows/linux- build.yml
  Size:     2970 bytes (3 KiB)
  Modified: 2021-05-17 13:21:36
with the file from archive:
  Path:     p7zip-master/.github/workflows/linux- build.yml
  Size:     2970 bytes (3 KiB)
  Modified: 2021-05-17 13:21:36
? (Y)es / (N)o / (A)lways / (S)kip all / A(u)to rename all / (Q)uit? q

Archives with Errors: 1



Break signaled

spvkgn avatar May 28 '21 15:05 spvkgn

On Ubuntu 20.04 I have this issue with Zip extraction.

I've tried to revert commit c104127e6a9364b8d6a1d79012e5249a129c3857 and that issue has gone.

spvkgn avatar May 29 '21 17:05 spvkgn

Hi, I don't know if it is related with the same issue: I've tested that some ZIP archives are not extracted correctly with p7zip 17.04, but they are correctly extracted using the vanilla p7zip 16.02,

I've tested the issue on Ubuntu 18.04 and 20.04 x86_64.

Some example of the archives showing the issue are: https://github.com/pkoutoupis/rapiddisk/archive/refs/tags/7.2.1.zip https://www.tosecdev.org/downloads/category/53-2021-08-08?download=105:tosec-dat-pack-complete-3246-tosec-v2021-08-08

Extraction ends successfully but part of the content is not extracted using p7zip 17.04 - e.g. some files are extracted as empty folders, some files are not extracted at all - and I can't find anything about the error(s) in the log of the operation.

Extracting same archives with the vanilla p7zip 16.02, or with other archive managers, produces the correct number of files and folders as output.

Listing and testing the archives with p7zip 17.04 is successful and no issues (e.g. data errors) are reported for the archives.

peazip avatar Oct 02 '21 13:10 peazip

@peazip just tested extracting 7.2.1.zip and du -sh rapiddisk-7.2.1/ = yealds 464K with 17.04, but I'm however currently testing with https://github.com/jinfeihan57/p7zip/issues/112#issuecomment-757566333

and the other one 357M - does this sound correct to you?

biopsin avatar Oct 02 '21 14:10 biopsin