Zip extraction issues
Reported by @tonurics on Arch Linux bugtracker (FS#69253).
To clarify, assume you have zip file called "foo.zip" containing one item: "readme.txt" Taking the same file above and instead using switch e [to extract to the working directory, i.e
7z e foo.zip], one of two things will happen [depending on the zip file]: sometimes 7z creates a file named "IBM437.so" with the contents of "readme.txt".
Seeing IBM437 makes me think of c104127...
Note: Arch Linux already switched to your p7zip fork and is providing 17.03 in the official repository.
I've added some additional comments to the ticket on the Arch Linux bugtracker, along with a zip file you can use to reproduce the issues.
I've updated the Arch Linux bugtracker with test results from releases and builds created from commits.
I skipped c3c8629b3b6d824c13d62efa5efda9e9139363a8 as it was marked as failing. Having said that https://github.com/jinfeihan57/p7zip/commit/c104127e6a9364b8d6a1d79012e5249a129c3857 is the first commit where the issue appears; which is also the one @jloqfjgk hinted at above.
Is this issue Arch-specific or it is reproduced on other distros using p7zip built from the same source tree?
PS: Attaching sample file from Arch issue:
I found a small zip file that can be used to reproduce the issue in 17.03-1. Both the x and e switch options will result in the issues I described above. In the case of the e switch: the "IBM437.so" file will appear, and not the error trying to replace the working directory. ComicInfo.zip
UPD: Built jinfeihan57/p7zip master on my Mint 20, issue is not reproducing on attached sample. Should I do it on Arch instead? What exact distro version should I use?
I tested the zip file attached above with the three 7z release binaries posted at: https://github.com/jinfeihan57/p7zip/releases
p7zip 17.01 = No Issue p7zip 17.02 = No Issue p7zip 17.03 = Broken
Tested that 17.03 binary on Mint 20, works fine on attached sample. Need to know exact Arch distro version to test on it.
Also, Arch Linux is rolling release so there is no version number.
Thanks! Will try to reproduce
If I pass
env LANG=Cthen the bug can be reproduced.
Still not reproducting on my Mint even with this trick. Do you use sample attached above, ComicInfo.zip?
Btw, can anyone please help with QUICK way to install Arch in VirtualBox? Tried zen installer, it completed, but system stays unbootable. Maybe anyone can provide ready to run virtualbox image?
Try disabling UEFI in VirtualBox
Already disabled.
follow the official installation guide
It looks like it should take an hour or more. Can't afford it, alas. Isn't there a faster way?
Will try preinstalled version from here https://www.osboxes.org/arch-linux/
Will try preinstalled version from here https://www.osboxes.org/arch-linux/ Remember to do
pacman -Syuso that all packages will be updated to the latest version.
Damn, it refuses to work also

Had to do the following:
-
from https://bbs.archlinux.org/viewtopic.php?pid=1653252#p1653252 edit /etc/pacman.d/gnupg/gpg.conf and change the keyserver line to: keyserver hkp://keyserver.ubuntu.com
-
from https://wiki.archlinux.org/index.php/Pacman/Package_signing#Cannot_import_keys sudo pacman-key --init sudo pacman-key --populate sudo pacman-key --refresh-keys sudo pacman -Sy archlinux-keyring
sudo pacman -Syu
For full upgrade to complete. That Arch is a bit tricky thing :)
Trying to unzip attached sample looks a bit buggy (is askes to overwrite non-existent file), but extraction itself works ok no .so file is created.

Maybe I'm doing something wrong or need some additional steps to reproduce IBM437.so creation? Exact image I used: https://nav.dl.sourceforge.net/project/osboxes/v/vb/4-Ar---c-x/2019.05/KDE/201905-kde-64bit.7z
For completeness, here are my localization settings:
/etc/locale.gen
en_US.UTF-8 UTF-8
en_US ISO-8859-1
[If you change this: run locale-gen and restart.]
/etc/locale.conf
LANG=en_US.UTF-8
[If you change this: restart.]
@unxed You appear to only be using the x [eXtract files with full paths] command; which creates superfluous directories, where instead, it should be should be creating files.
The e [Extract files from archive (without using directory names)] command on the other hand does one of two things [depending on the zip archive]. It will either: try to replace the working directory [and fail] or create a file named "IBM437.so" with the contents of one the the file entries in the zip archive.
You can read my ticket on the Arch Linux bugtracker here: https://bugs.archlinux.org/task/69253?project=1&string=p7zip
As for your Arch installation, I'm not sure you have a fully working system. If you continue to run into problems reproducing the issue:
*) Revert the changes you made to: /etc/pacman.d/gnupg/gpg.conf
*) Re-initialize the pacman keyring: pacman-key --init
*) Fetch the archlinux signing keys: pacman-key --populate archlinux
*) Clear out any junk: pacman-key --refresh-keys
*) Force update of repos and packages [note "yy"]: pacman -Syyu
As Arch is a rolling release, if you have an image that hasn't been updated for some time: it can get stuck where new packages can't be updated and the best option is to install a fresh copy [which is generally pretty quick, if you follow along with the Installation guide; you might even have just been able to use the install media as a sort of CLI only live CD]. Having said all that, things for taking time to get Arch running for testing!
I can confirm this is happening on Arch with 17.03
Arch doesn't do anything strange to build it though: make OPTFLAGS="$CPPFLAGS $CFLAGS" 7z 7zr 7za
See the PKGBUILD here: https://github.com/archlinux/svntogit-packages/blob/packages/p7zip/trunk/PKGBUILD
At the end of the new code added in c104127e6a9364b8d6a1d79012e5249a129c3857, res seems to contain:
ComicInfo.xml(NUL)(NUL)(NUL): gconv_end(NUL)/usr/lib/gconv/IBM437.so
So the IBM437.so filename originates from garbage at the end of the output string. Perhaps s_utf8 can be trimmed to the appropriate length thus avoiding ConvertUTF8ToUnicode carrying over the unwanted characters to the output string?
The following approach correctly extracts ComicInfo.xml but my understanding of the code is very limited so it could be going about it the wrong way. (Also removed the + 1 from the buffer allocation because the *_SetEnd methods already handle null termination from what I gathered.)
diff --git a/CPP/7zip/Archive/Zip/ZipItem.cpp b/CPP/7zip/Archive/Zip/ZipItem.cpp
index 353e895..4d48296 100644
--- a/CPP/7zip/Archive/Zip/ZipItem.cpp
+++ b/CPP/7zip/Archive/Zip/ZipItem.cpp
@@ -423,10 +423,10 @@ void CItem::GetUnicodeString(UString &res, const AString &s, bool isComment, boo
const char* src = s.Ptr();
size_t slen = s.Len();
size_t dlen = slen * 4;
- const char* dest = s_utf8.GetBuf_SetEnd(dlen + 1); // (source length * 4) + null termination
+ const char* dest = s_utf8.GetBuf_SetEnd(dlen); // (source length * 4)
size_t done = iconv(cd, (char**)&src, &slen, (char**)&dest, &dlen);
- bzero((size_t*)dest + done, 1);
+ s_utf8.ReleaseBuf_SetEnd(s_utf8.Len() - dlen);
iconv_close(cd);
I don't work on arch linux. Anybody solve this issue? OR does arch linux have a compile macro(like APPLE),so we can skipe the code.
For me 17.03 testing outputs
$ 7z x -y ComicInfo.zip => ComicInfo.xml
but
$ 7z e ComicInfo.zip => IBM437.s
$ file ComicInfo.xml
ComicInfo.xml: XML 1.0 document, ASCII text
and
$ file IBM437.s
IBM437.s: XML 1.0 document, ASCII text
@biopsin I assume you are seeing still seeing the erroneous folder creation problem with the x [eXtract files with full paths] command, even though you didn't give it a new output path; i.e. 7z x -o/tmp/1 foo.zip. Your use of -y [Assume Yes on all queries] would have 7z silently replace the folders with files.
I would be cautious of using the -y switch with e [Extract files from archive (without using directory names)] command, if a zip archive has multiple file entries: they will all want to overwrite each other into the one "IBM437.so" file.
Yes I see it; extracting with 7z x $1 but not with 7z e $1
OR does arch linux have a compile macro(like APPLE),so we can skipe the code.
I wouldn't consider this to be an Arch Linux issue. More likely this is related to newer libstdc++ or some other core library.
For what it's worth, this is also reproducible on Fedora 32 and Fedora 33 (7z e ComicInfo.zip creates IBM437.s). I would conclude that it happens with GCC 10's libstdc++ but Ubuntu 20.10 appears to be unaffected.
@unxed: Could you test on Fedora 33 (or Fedora 32) which works as a live disc and doesn't require installation/configuration?
I just tested the following distros livecds in VirtualBox [except Arch] using default settings and the compiled p7zip 17.03 release:
| Distro | Based on | Result | e Filename |
glibc Version |
|---|---|---|---|---|
| Sparky Linux | Debian Testing | Pass | N/A | 2.31 |
| Arch Linux | Independent | Fail | IBM437.so | 2.32 |
| Clear Linux | Independent | Fail | IBM437.s | 2.31 |
| PCLinuxOS | Independent | Fail | IBM437.s | 2.31 |
| Solus | Independent | Fail | IBM437.s | 2.29 |
Note: Seeing that Sparky Linux passed [which is Debian Testing], I would assume: that all other Debian distros based would likely pass. So stuck to testing independent upstream source distros with recent releases.
It doesn't really matter why Debian doesn't repro this exact bug. Increasing the length/size of s_utf8 to something like 1024 bytes results in incorrect filename on Debian 10:
root@debian:~# ls
'ComicInfo>'$'\n''PK'$'\001\002''?' ComicInfo.zip p7zip
And on second run:
root@debian:~# ./p7zip/bin/7z e ComicInfo.zip
...
Would you like to replace the existing file:
Path: ./ComicInfo>
PK?
Size: 590 bytes (1 KiB)
Modified: 2001-01-01 09:01:00
with the file from archive:
Path: ComicInfo.xml
...
Either a ReleaseBuf_SetEnd call is needed to trim s_utf8 (see https://github.com/jinfeihan57/p7zip/issues/112#issuecomment-757566333) or other code needs to be adjusted to ignore the random data in the string buffer after the first NUL character.
Can this bug be reproduced on musl-based distributions, such as Alpine Linux?
@jloqfjgk [All your comments appear to have been deleted.] For edification: I was actually curious about that myself and did try to test it. But the 17.03 release binary didn't run. And not knowing enough about Alpine/musl: didn't want to try compiling it myself and create any red herrings or false feedback.
What is the status of this issue? Do you want me to create a Dockerfile to reproduce this issue?
On Ubuntu 20.04 I have this issue with Zip extraction.
$ wget -q https://github.com/jinfeihan57/p7zip/archive/refs/heads/master.zip
$ ls -la
итого 7352
drwxrwxr-x 2 pavel pavel 60 мая 28 20:15 .
drwxrwxrwt 27 root root 800 мая 28 20:09 ..
-rw-rw-r-- 1 pavel pavel 7528082 мая 28 19:52 master.zip
$ 7z x master.zip
7-Zip [64] 17.04 : Copyright (c) 1999-2021 Igor Pavlov : 2017-08-28
p7zip Version 17.04 (locale=ru_RU.UTF-8,Utf16=on,HugeFiles=on,64 bits,4 CPUs Intel(R) Xeon(R) CPU L5420 @ 2.50GHz (1067A),ASM)
Scanning the drive for archives:
1 file, 7528082 bytes (7352 KiB)
Extracting archive: master.zip
--
Path = master.zip
Type = zip
Physical Size = 7528082
Comment = eb1bbb0d0327a103850fec519015986e72a1ebf0
Would you like to replace the existing file:
Path: ./p7zip-master/.github/workflows/macos-build.yml
Size: 0 bytes
Modified: 2021-05-28 19:52:47
with the file from archive:
Path: p7zip-master/.github/workflows/macos-build.yml
Size: 3024 bytes (3 KiB)
Modified: 2021-05-17 13:21:36
? (Y)es / (N)o / (A)lways / (S)kip all / A(u)to rename all / (Q)uit? q
Archives with Errors: 1
Break signaled
$ 7za x master.zip
7-Zip (a) [64] 17.04 : Copyright (c) 1999-2021 Igor Pavlov : 2017-08-28
p7zip Version 17.04 (locale=ru_RU.UTF-8,Utf16=on,HugeFiles=on,64 bits,4 CPUs Intel(R) Xeon(R) CPU L5420 @ 2.50GHz (1067A),ASM)
Scanning the drive for archives:
1 file, 7528082 bytes (7352 KiB)
Extracting archive: master.zip
--
Path = master.zip
Type = zip
Physical Size = 7528082
Comment = eb1bbb0d0327a103850fec519015986e72a1ebf0
Would you like to replace the existing file:
Path: ./p7zip-master/.github/workflows/linux- build.yml
Size: 2970 bytes (3 KiB)
Modified: 2021-05-17 13:21:36
with the file from archive:
Path: p7zip-master/.github/workflows/linux- build.yml
Size: 2970 bytes (3 KiB)
Modified: 2021-05-17 13:21:36
? (Y)es / (N)o / (A)lways / (S)kip all / A(u)to rename all / (Q)uit? q
Archives with Errors: 1
Break signaled
On Ubuntu 20.04 I have this issue with Zip extraction.
I've tried to revert commit c104127e6a9364b8d6a1d79012e5249a129c3857 and that issue has gone.
Hi, I don't know if it is related with the same issue: I've tested that some ZIP archives are not extracted correctly with p7zip 17.04, but they are correctly extracted using the vanilla p7zip 16.02,
I've tested the issue on Ubuntu 18.04 and 20.04 x86_64.
Some example of the archives showing the issue are: https://github.com/pkoutoupis/rapiddisk/archive/refs/tags/7.2.1.zip https://www.tosecdev.org/downloads/category/53-2021-08-08?download=105:tosec-dat-pack-complete-3246-tosec-v2021-08-08
Extraction ends successfully but part of the content is not extracted using p7zip 17.04 - e.g. some files are extracted as empty folders, some files are not extracted at all - and I can't find anything about the error(s) in the log of the operation.
Extracting same archives with the vanilla p7zip 16.02, or with other archive managers, produces the correct number of files and folders as output.
Listing and testing the archives with p7zip 17.04 is successful and no issues (e.g. data errors) are reported for the archives.
@peazip just tested extracting 7.2.1.zip and du -sh rapiddisk-7.2.1/ = yealds 464K with 17.04, but I'm however currently testing with https://github.com/jinfeihan57/p7zip/issues/112#issuecomment-757566333
and the other one 357M - does this sound correct to you?