Zip extract fails for zips created with Windows
Bug report
Bug description:
import os
from pathlib import Path
from zipfile import ZipFile
def unzip(target_file, destination=Path("/tmp/")):
os.path.altsep = "\\"
with ZipFile(target_file, "r") as zipf:
zipf.extractall(path=destination)
Im building a small web service which allows user to upload zip files containing docs https://github.com/docat-org/docat/blob/main/docat/docat/utils.py#L36 which get unpackt and hosted.
Because of that people create zip files on windows which we try to unpack on linux. Which fails:
Traceback (most recent call last):
File "/home/user/Downloads/example/test.py", line 71, in <module>
unzip(Path("/home/user/Downloads/example/devdocu.zip"))
File "/home/user/Downloads/example/test.py", line 66, in unzip
zipf.extractall(path=destination)
File "/home/user/.asdf/installs/python/3.12.2/lib/python3.12/zipfile/__init__.py", line 1734, in extractall
self._extract_member(zipinfo, path, pwd)
File "/home/user/.asdf/installs/python/3.12.2/lib/python3.12/zipfile/__init__.py", line 1784, in _extract_member
os.makedirs(upperdirs)
File "<frozen os>", line 225, in makedirs
NotADirectoryError: [Errno 20] Not a directory: '/tmp/DevDocu/articles/XXXX/FBRT/BackupAndRestore'
Unfortunately i can't share the actual zip file as it contains sensitive data.
But zip -T thinks it is ok and unzip unpacks this zip without issues.
A bit of digging showed the problem as far as i can tell is in is_dir which claims that
self.filename='DevDocu\\articles\\XXX\\FBRT\\' is_dir? False
is a file and creates an empty file instead of a dir.
CPython versions tested on:
3.12
Operating systems tested on:
Linux
Linked PRs
- gh-117085
- gh-117129
I wonder what program created such files?
On one hand, the ZIP file specification says that / should be used as a path component separator. On other hand, for practical reasons, backslashes are treated as separators too when extracting file on Windows. But ZipInfo.is_dir() only tests for /, so there is inconsistency. It can be considered a bug. ZipInfo.is_dir() should be consistent with the extracting code.
You use a hack by setting os.path.altsep on Linux. It works for now, and I am not going to break this, but it is a hack, and there is no obligation to support this in future.
In future, we should add an option to better control this behavior. It should allow to treat a backslash as a path component separator on non-Windows platforms or as error on Windows, or emit warnings.
I wonder what program created such files?
I think it was powershell and i found this bug https://github.com/PowerShell/Microsoft.PowerShell.Archive/issues/48 but i will do some more digging because i agree windows should just create valid zip files in the first place.
Thank you for reference @fliiiix. I created the test file using GUI, and it produced the file with forward slashes. But it also only adds file entries, not directory entries. Either they use completely different code, or this issue was already fixed.
There is other relevant Python issue: #91036.
It seems that if you create zip files with powershell old enough it fails but i don't have personally a windows to to confirm this