cpython icon indicating copy to clipboard operation
cpython copied to clipboard

zipfile.Path.filename not working when root attribute is io.BytesIO

Open buhtz opened this issue 2 years ago • 4 comments

Using zipfile.Path with a ZIP-file "in memory" as io.BytesIO instance. In that case attributes like .filename() or .__str__() do not work because they try to instanciate a pathlib.Path() with a io.BytesIO object.

I can confirm this with Python 3.11 (run in Docker), Python 3.10.2 (Windows) and Python 3.9.2 (Debian stable). See output below for detailed version infos.

Code to reproduce

#!/usr/bin/env python3
import sys
import io
import pathlib
import zipfile

print('{} {}\n'.format(sys.platform, sys.version))

zip_bytes = io.BytesIO()

# create a zipfile in memory
with zipfile.ZipFile(zip_bytes, 'w') as zf:
    zf.writestr('entry.file', 'content')

# ZIP-Path object from in-memory-ZIP
my_zip_path = zipfile.Path(zip_bytes, 'entry.file')
print('Exists {}'.format(my_zip_path.exists()))

# here comes the Exception
print(my_zip_path)

Output: Python 3.11 on Linux/Docker

Digest: sha256:250990a809a15bb6a3e307fec72dead200c882c940523fb1694baa78eb0e47c6
Status: Downloaded newer image for python:3.11
Python 3.11.1 (main, Dec 21 2022, 18:32:57) [GCC 10.2.1 20210110] on linux
......
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python3.11/zipfile.py", line 2421, in filename
    return pathlib.Path(self.root.filename).joinpath(self.at)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/pathlib.py", line 871, in __new__
    self = cls._from_parts(args)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/pathlib.py", line 509, in _from_parts
    drv, root, parts = self._parse_args(args)
                       ^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/pathlib.py", line 493, in _parse_args
    a = os.fspath(a)
        ^^^^^^^^^^^^
TypeError: expected str, bytes or os.PathLike object, not NoneType

Output: Python 3.10 on Windows

win32 3.10.2 (tags/v3.10.2:a58ebcc, Jan 17 2022, 14:12:15) [MSC v.1929 64 bit (AMD64)]

Exists True
Traceback (most recent call last):
  File "C:\Users\buhtzch\ownCloud\my.work\buhtzology\zippathbug.py", line 28, in <module>
    print(my_zip_path)
  File "C:\Program Files\Python310\lib\zipfile.py", line 2380, in __str__
    return posixpath.join(self.root.filename, self.at)
  File "C:\Program Files\Python310\lib\posixpath.py", line 76, in join
    a = os.fspath(a)
TypeError: expected str, bytes or os.PathLike object, not NoneType

Output: Python 3.9.2 on Debian stable (11)

linux 3.9.2 (default, Feb 28 2021, 17:03:44)
[GCC 10.2.1 20210110]

Exists True
Traceback (most recent call last):
  File "/home/user/ownCloud/my.work/buhtzology/zippathbug.py", line 28, in <module>
    print(my_zip_path)
  File "/usr/lib/python3.9/zipfile.py", line 2347, in __str__
    return posixpath.join(self.root.filename, self.at)
  File "/usr/lib/python3.9/posixpath.py", line 76, in join
    a = os.fspath(a)
TypeError: expected str, bytes or os.PathLike object, not NoneType

buhtz avatar Dec 29 '22 21:12 buhtz

This is mentioned in the zipfile.Path docstring (though the exception type is wrong):

https://github.com/python/cpython/blob/79c10b7da84f52999dc483fc62c8e758ad3eff23/Lib/zipfile/_path.py#L216-L226

This feature feels fine within the third-party zipp package -- where zipfile.Path originates -- but is perhaps overwrought within a CPython context. IMO it would suffice to make the original ZipFile object available from path objects via an attribute like my_zip_path.zipfile and leave it there.

We may have an opportunity to revise this when we introduce a fully pathlib-compatible implementation of ZipPath. More here: https://discuss.python.org/t/make-pathlib-extensible/3428

barneygale avatar Dec 30 '22 14:12 barneygale

but is perhaps overwrought within a CPython context

I'm not sure I understand the distinction. zipp means to provide the same behavior (just earlier, to older Pythons), so the two implementations should be kept in sync.

IMO it would suffice to make the original ZipFile available...

And it does currently through .root, which has its own issues (the name conflicts with the meaning of pathlib.Path.root).

I don't like the implementation of Path.filename. I'm tempted to deprecate it altogether. __str__ provides something similar but different. .name, .parent and newly .suffix, .suffixes, and .stem all are ill-defined when the zipfile has no filename, so I believe an exception is appropriate in those cases.

But you're right, it's not great to not have a good __str__ representation if the zipfile has no name. I'll work on it.

jaraco avatar Jul 14 '23 19:07 jaraco

Sorry for not being clear. It would suffice to make the original ZipFile available via an attribute. The behaviour of filename, name, and parents at the root is slightly too surprising for a standard library module IMO.

barneygale avatar Jul 14 '23 20:07 barneygale

In jaraco/zipp#107, I determined that it's not going to be straightforward to support these attributes, and the best recommendation is for users to provide a filename on the ZipFile object.

At this point, I'm thinking the best option might be to document these limitations and recommendations.

jaraco avatar Mar 21 '24 02:03 jaraco

In #130120 and the linked PR, @yngvem is exploring this issue. As you can see above, I'd previously concluded:

Users affected by this issue should supply a filename for their zipfile, even if it's just "".

And

the best option might be to document these limitations and recommendations.

@yngvem - what do you think of the recommendation for the caller to set the zipfile .filename? Would that work for your use-case?

What do you think about the following approach:

  1. Document that unnamed zip files will raise TypeErrors when rendered using str and that callers should pass zipfiles with string filenames if they expect for str to work on the Path.
  2. Update the tests to capture this expectation (that str raises a TypeError when the zipfile has no filename and uses the filename even on a memory buffered zipfile).
  3. Maybe also capture the expected repr while we're at it.

jaraco avatar Feb 27 '25 00:02 jaraco

@jaraco, I think that approach seems very reasonable! Though, maybe it would be nice if the TypeError is raised explicitly in Path.__str__, Path.filename and Path._base so it can contain short text explaining why filename is None?

I would be happy to provide all those changes :)

yngvem avatar Mar 28 '25 11:03 yngvem

Are there other parts of the stdlib where __str__() raises? That in particular seems weird to me and probably worth fixing somehow. I'm less bothered by filename, name and parent - I think documenting the workaround would be fine there.

barneygale avatar Mar 28 '25 12:03 barneygale

@barneygale, I also think it seems weird for __str__ to fail, but there are no obvious string representation for unnamed zipfiles (see #130381).

I updated https://github.com/python/cpython/pull/130381 which makes __str__ work (but parent and name still raises errors) based the review comments. I'll make a PR where it has a better exception message as well so you can chose whichever implementation you prefer :)

yngvem avatar Mar 28 '25 13:03 yngvem

I like the idea of using :memory: personally!

:fileobj: might be another option, given the file object might not be backed by memory :)

barneygale avatar Mar 28 '25 13:03 barneygale

A similar-ish case in pathlib comes to mind - pathlib.Path.is_mount() used to unconditionally raise NotImplementedError on Windows. In theory someone could have been relying on that behaviour, but we added Windows support in 3.12 nonetheless.

barneygale avatar Mar 28 '25 14:03 barneygale

I've logged this upstream issue: https://github.com/jaraco/zipp/issues/134

barneygale avatar Apr 16 '25 20:04 barneygale

And an upstream PR: https://github.com/jaraco/zipp/pull/135

barneygale avatar Apr 25 '25 18:04 barneygale