cpython icon indicating copy to clipboard operation
cpython copied to clipboard

Zip extract fails for zips created with Windows

Open fliiiix opened this issue 1 year ago • 1 comments

Bug report

Bug description:

import os
from pathlib import Path
from zipfile import ZipFile

def unzip(target_file, destination=Path("/tmp/")):
    os.path.altsep = "\\"
    with ZipFile(target_file, "r") as zipf:
        zipf.extractall(path=destination)

Im building a small web service which allows user to upload zip files containing docs https://github.com/docat-org/docat/blob/main/docat/docat/utils.py#L36 which get unpackt and hosted.

Because of that people create zip files on windows which we try to unpack on linux. Which fails:

Traceback (most recent call last):
  File "/home/user/Downloads/example/test.py", line 71, in <module>
    unzip(Path("/home/user/Downloads/example/devdocu.zip"))
  File "/home/user/Downloads/example/test.py", line 66, in unzip
    zipf.extractall(path=destination)
  File "/home/user/.asdf/installs/python/3.12.2/lib/python3.12/zipfile/__init__.py", line 1734, in extractall
    self._extract_member(zipinfo, path, pwd)
  File "/home/user/.asdf/installs/python/3.12.2/lib/python3.12/zipfile/__init__.py", line 1784, in _extract_member
    os.makedirs(upperdirs)
  File "<frozen os>", line 225, in makedirs
NotADirectoryError: [Errno 20] Not a directory: '/tmp/DevDocu/articles/XXXX/FBRT/BackupAndRestore'

Unfortunately i can't share the actual zip file as it contains sensitive data. But zip -T thinks it is ok and unzip unpacks this zip without issues.

A bit of digging showed the problem as far as i can tell is in is_dir which claims that

self.filename='DevDocu\\articles\\XXX\\FBRT\\' is_dir? False

is a file and creates an empty file instead of a dir.

CPython versions tested on:

3.12

Operating systems tested on:

Linux

Linked PRs

  • gh-117085
  • gh-117129

fliiiix avatar Mar 20 '24 15:03 fliiiix

I wonder what program created such files?

On one hand, the ZIP file specification says that / should be used as a path component separator. On other hand, for practical reasons, backslashes are treated as separators too when extracting file on Windows. But ZipInfo.is_dir() only tests for /, so there is inconsistency. It can be considered a bug. ZipInfo.is_dir() should be consistent with the extracting code.

You use a hack by setting os.path.altsep on Linux. It works for now, and I am not going to break this, but it is a hack, and there is no obligation to support this in future.

In future, we should add an option to better control this behavior. It should allow to treat a backslash as a path component separator on non-Windows platforms or as error on Windows, or emit warnings.

serhiy-storchaka avatar Mar 21 '24 15:03 serhiy-storchaka

I wonder what program created such files?

I think it was powershell and i found this bug https://github.com/PowerShell/Microsoft.PowerShell.Archive/issues/48 but i will do some more digging because i agree windows should just create valid zip files in the first place.

fliiiix avatar Mar 22 '24 07:03 fliiiix

Thank you for reference @fliiiix. I created the test file using GUI, and it produced the file with forward slashes. But it also only adds file entries, not directory entries. Either they use completely different code, or this issue was already fixed.

There is other relevant Python issue: #91036.

serhiy-storchaka avatar Mar 22 '24 10:03 serhiy-storchaka

It seems that if you create zip files with powershell old enough it fails but i don't have personally a windows to to confirm this

fliiiix avatar Mar 22 '24 14:03 fliiiix