zipfile.Path.filename not working when root attribute is io.BytesIO
Using zipfile.Path with a ZIP-file "in memory" as io.BytesIO instance. In that case attributes like .filename() or .__str__() do not work because they try to instanciate a pathlib.Path() with a io.BytesIO object.
I can confirm this with Python 3.11 (run in Docker), Python 3.10.2 (Windows) and Python 3.9.2 (Debian stable). See output below for detailed version infos.
Code to reproduce
#!/usr/bin/env python3
import sys
import io
import pathlib
import zipfile
print('{} {}\n'.format(sys.platform, sys.version))
zip_bytes = io.BytesIO()
# create a zipfile in memory
with zipfile.ZipFile(zip_bytes, 'w') as zf:
zf.writestr('entry.file', 'content')
# ZIP-Path object from in-memory-ZIP
my_zip_path = zipfile.Path(zip_bytes, 'entry.file')
print('Exists {}'.format(my_zip_path.exists()))
# here comes the Exception
print(my_zip_path)
Output: Python 3.11 on Linux/Docker
Digest: sha256:250990a809a15bb6a3e307fec72dead200c882c940523fb1694baa78eb0e47c6
Status: Downloaded newer image for python:3.11
Python 3.11.1 (main, Dec 21 2022, 18:32:57) [GCC 10.2.1 20210110] on linux
......
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/local/lib/python3.11/zipfile.py", line 2421, in filename
return pathlib.Path(self.root.filename).joinpath(self.at)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/pathlib.py", line 871, in __new__
self = cls._from_parts(args)
^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/pathlib.py", line 509, in _from_parts
drv, root, parts = self._parse_args(args)
^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/pathlib.py", line 493, in _parse_args
a = os.fspath(a)
^^^^^^^^^^^^
TypeError: expected str, bytes or os.PathLike object, not NoneType
Output: Python 3.10 on Windows
win32 3.10.2 (tags/v3.10.2:a58ebcc, Jan 17 2022, 14:12:15) [MSC v.1929 64 bit (AMD64)]
Exists True
Traceback (most recent call last):
File "C:\Users\buhtzch\ownCloud\my.work\buhtzology\zippathbug.py", line 28, in <module>
print(my_zip_path)
File "C:\Program Files\Python310\lib\zipfile.py", line 2380, in __str__
return posixpath.join(self.root.filename, self.at)
File "C:\Program Files\Python310\lib\posixpath.py", line 76, in join
a = os.fspath(a)
TypeError: expected str, bytes or os.PathLike object, not NoneType
Output: Python 3.9.2 on Debian stable (11)
linux 3.9.2 (default, Feb 28 2021, 17:03:44)
[GCC 10.2.1 20210110]
Exists True
Traceback (most recent call last):
File "/home/user/ownCloud/my.work/buhtzology/zippathbug.py", line 28, in <module>
print(my_zip_path)
File "/usr/lib/python3.9/zipfile.py", line 2347, in __str__
return posixpath.join(self.root.filename, self.at)
File "/usr/lib/python3.9/posixpath.py", line 76, in join
a = os.fspath(a)
TypeError: expected str, bytes or os.PathLike object, not NoneType
This is mentioned in the zipfile.Path docstring (though the exception type is wrong):
https://github.com/python/cpython/blob/79c10b7da84f52999dc483fc62c8e758ad3eff23/Lib/zipfile/_path.py#L216-L226
This feature feels fine within the third-party zipp package -- where zipfile.Path originates -- but is perhaps overwrought within a CPython context. IMO it would suffice to make the original ZipFile object available from path objects via an attribute like my_zip_path.zipfile and leave it there.
We may have an opportunity to revise this when we introduce a fully pathlib-compatible implementation of ZipPath. More here: https://discuss.python.org/t/make-pathlib-extensible/3428
but is perhaps overwrought within a CPython context
I'm not sure I understand the distinction. zipp means to provide the same behavior (just earlier, to older Pythons), so the two implementations should be kept in sync.
IMO it would suffice to make the original
ZipFileavailable...
And it does currently through .root, which has its own issues (the name conflicts with the meaning of pathlib.Path.root).
I don't like the implementation of Path.filename. I'm tempted to deprecate it altogether. __str__ provides something similar but different. .name, .parent and newly .suffix, .suffixes, and .stem all are ill-defined when the zipfile has no filename, so I believe an exception is appropriate in those cases.
But you're right, it's not great to not have a good __str__ representation if the zipfile has no name. I'll work on it.
Sorry for not being clear. It would suffice to make the original ZipFile available via an attribute. The behaviour of filename, name, and parents at the root is slightly too surprising for a standard library module IMO.
In jaraco/zipp#107, I determined that it's not going to be straightforward to support these attributes, and the best recommendation is for users to provide a filename on the ZipFile object.
At this point, I'm thinking the best option might be to document these limitations and recommendations.
In #130120 and the linked PR, @yngvem is exploring this issue. As you can see above, I'd previously concluded:
Users affected by this issue should supply a filename for their zipfile, even if it's just "
".
And
the best option might be to document these limitations and recommendations.
@yngvem - what do you think of the recommendation for the caller to set the zipfile .filename? Would that work for your use-case?
What do you think about the following approach:
- Document that unnamed zip files will raise TypeErrors when rendered using
strand that callers should pass zipfiles with string filenames if they expect forstrto work on thePath. - Update the tests to capture this expectation (that
strraises a TypeError when the zipfile has no filename and uses the filename even on a memory buffered zipfile). - Maybe also capture the expected
reprwhile we're at it.
@jaraco, I think that approach seems very reasonable! Though, maybe it would be nice if the TypeError is raised explicitly in Path.__str__, Path.filename and Path._base so it can contain short text explaining why filename is None?
I would be happy to provide all those changes :)
Are there other parts of the stdlib where __str__() raises? That in particular seems weird to me and probably worth fixing somehow. I'm less bothered by filename, name and parent - I think documenting the workaround would be fine there.
@barneygale, I also think it seems weird for __str__ to fail, but there are no obvious string representation for unnamed zipfiles (see #130381).
I updated https://github.com/python/cpython/pull/130381 which makes __str__ work (but parent and name still raises errors) based the review comments. I'll make a PR where it has a better exception message as well so you can chose whichever implementation you prefer :)
I like the idea of using :memory: personally!
:fileobj: might be another option, given the file object might not be backed by memory :)
A similar-ish case in pathlib comes to mind - pathlib.Path.is_mount() used to unconditionally raise NotImplementedError on Windows. In theory someone could have been relying on that behaviour, but we added Windows support in 3.12 nonetheless.
I've logged this upstream issue: https://github.com/jaraco/zipp/issues/134
And an upstream PR: https://github.com/jaraco/zipp/pull/135