pip
pip copied to clipboard
WheelDistribution doesn't implement abstract methods of importlib.metadata.Distribution
Description
In https://github.com/python/cpython/pull/100466, we learned that pip implements a WheelDistribution derived from importlib.metadata.Distribution, but doesn't implement the abstract methods. This usage was allowed because the ABC didn't derive from abc.ABCMeta. I don't recall if that choice was intentional or not, but attempting to enforce the definition of abstract methods in subclasses breaks pip invocations.
Expected behavior
Pip should implement the abstract methods (even if just trivially).
pip version
22.3.1
Python version
3.12a3ish
OS
all
How to Reproduce
Apply the patch from the referenced Python PR to the Python stdlib and invoke pip install.
Output
No response
Code of Conduct
- [X] I agree to follow the PSF Code of Conduct.
It should be noted that partial implementation of the Distribution interface is now deprecated: https://github.com/python/importlib_metadata/pull/451
Now that the behavior is in Python 3.12, the repro is:
draft 🐚 py -m venv .venv
draft 🐚 py -W error::DeprecationWarning:pip._internal.metadata.importlib._dists -m pip install sampleproject
Collecting sampleproject
Using cached sampleproject-3.0.0-py3-none-any.whl.metadata (4.4 kB)
Collecting peppercorn (from sampleproject)
Using cached peppercorn-0.6-py3-none-any.whl.metadata (3.4 kB)
Using cached sampleproject-3.0.0-py3-none-any.whl (4.7 kB)
Using cached peppercorn-0.6-py3-none-any.whl (4.8 kB)
ERROR: Exception:
Traceback (most recent call last):
File "/Users/jaraco/.local/pip-run/pip/_internal/cli/base_command.py", line 179, in exc_logging_wrapper
status = run_func(*args)
File "/Users/jaraco/.local/pip-run/pip/_internal/cli/req_command.py", line 67, in wrapper
return func(self, options, args)
File "/Users/jaraco/.local/pip-run/pip/_internal/commands/install.py", line 377, in run
requirement_set = resolver.resolve(
reqs, check_supported_wheels=not options.target_dir
)
File "/Users/jaraco/.local/pip-run/pip/_internal/resolution/resolvelib/resolver.py", line 179, in resolve
self.factory.preparer.prepare_linked_requirements_more(reqs)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^
File "/Users/jaraco/.local/pip-run/pip/_internal/operations/prepare.py", line 554, in prepare_linked_requirements_more
self._complete_partial_requirements(
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^
partially_downloaded_reqs,
^^^^^^^^^^^^^^^^^^^^^^^^^^
parallel_builds=parallel_builds,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
)
^
File "/Users/jaraco/.local/pip-run/pip/_internal/operations/prepare.py", line 489, in _complete_partial_requirements
self._prepare_linked_requirement(req, parallel_builds)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^
File "/Users/jaraco/.local/pip-run/pip/_internal/operations/prepare.py", line 642, in _prepare_linked_requirement
dist = _get_prepared_distribution(
req,
...<3 lines>...
self.check_build_deps,
)
File "/Users/jaraco/.local/pip-run/pip/_internal/operations/prepare.py", line 75, in _get_prepared_distribution
return abstract_dist.get_metadata_distribution()
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^
File "/Users/jaraco/.local/pip-run/pip/_internal/distributions/wheel.py", line 34, in get_metadata_distribution
return get_wheel_distribution(wheel, canonicalize_name(self.req.name))
File "/Users/jaraco/.local/pip-run/pip/_internal/metadata/__init__.py", line 106, in get_wheel_distribution
return select_backend().Distribution.from_wheel(wheel, canonical_name)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/jaraco/.local/pip-run/pip/_internal/metadata/importlib/_dists.py", line 133, in from_wheel
dist = WheelDistribution.from_zipfile(zf, name, wheel.location)
File "/Users/jaraco/.local/pip-run/pip/_internal/metadata/importlib/_dists.py", line 73, in from_zipfile
return cls(files, info_location)
File "/opt/python/lib/python3.13/importlib/metadata/__init__.py", line 344, in __new__
warnings.warn(
~~~~~~~~~~~~~^
f"Unimplemented abstract methods {abstract}",
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
DeprecationWarning,
^^^^^^^^^^^^^^^^^^^
stacklevel=2,
^^^^^^^^^^^^^
)
^
DeprecationWarning: Unimplemented abstract methods {'locate_file'}
And we can probably expect pip to start failing in Python 3.14.
So I've taken a look at the code. I see now that pip implements the WheelDistribution because importlib.metadata.PathDistribution holds the zipfile.Path open too long for pip's needs.
I wonder if a better way to deal with the situation is to eliminate this concern (and the need for WheelDistribution altogether). Did you know that zipfile.Path can open a zipfile in memory? How much pain would users notice if pip were to load the whole wheel into memory for the life of the Distribution object? If that's too much overhead, one could simply construct a metadata-only wheel (in-memory) and pass that to PathDistribution, effectively implementing the same behavior that's there currently, but without creating a whole new WheelDistribution object.
I'm going to push back on this because you (or more accurately the importlib.metadata docs) haven't explained why our implementation isn't acceptable. It may well be that we could do as you say, but that's new, fragile, code that would need writing and maintaining, and surely the whole point of allowing user implementations of distribution objects is so that users can implement custom behaviours - if the docs don't explain the limitations on doing so, the docs should be fixed first, so that users can be sure that any reimplementation required of them will be supported going forward, and won't be subject to future unexpected breakages like this one.
the
importlib.metadatadocs) haven't explained why our implementation isn't acceptable
The trivial "why" is because it's an abstract method, so according to OO rules, subclasses need to implement it (even if there's no good justification for requiring it). Admittedly, that's a pretty weak rationale.
I reviewed the latest extending docs, which do include an example that implements locate_file raising a NotImplementedError, but they don't spend any time explaining why it's implemented or under what circumstances it could be implemented.
I expanded the docstring for Distribution.locate_file to elaborate on the rationale - why does that method exist and what happens when it's not implemented.
Does that clarify? Would you like to see more explanation in the Extending docs as well?
It may well be that we could do as you say, but that's new, fragile, code that would need writing and maintaining
In 93f71fa, I've started exploring what the change would look like. At first, I'm exploring simply loading any wheel that's not already in memory into memory. I'm optimistic that the approach, as it's only three lines of code replacing a large class, is promising to be much less to maintain.
Does that clarify? Would you like to see more explanation in the Extending docs as well?
An explicit note in the narrative text about DatabaseDistribution would be helpful, that explains that because this distribution type doesn't implement a full filesystem, it's OK to raise an error from the locate_file method, although this will mean that the importlib.metadata.files function will fail if it encounters a DatabaseDistribution.
With that, along with the docstring content being reflected in the API docs (I assume that will happen, it's not sufficient IMO for it to only be in the source code), I think it's sufficient for our purposes.
In 93f71fa, I've started exploring what the change would look like. At first, I'm exploring simply loading any wheel that's not already in memory into memory.
I wouldn't bother. It sounds like we'd be better just raising an error from the method. We don't call files[^1], and pip's internal API isn't supported for use by 3rd party code, so there's no need to care about that scenario.
I think it's a code smell that it's not OK to omit the implementation of this method, but it is OK to implement a method which just raises an exception - but the code smell is (to me) in the design of the importlib.metadata API, not in pip's usage of that API.
[^1]: Someone should check that! I'm saying it on the basis of your assertion that files() needs the locate_file method and will fail without it. So I'm assuming that if we did call it, we'd have had test failures or user bug reports by now.
I think it's a code smell that it's not OK to omit the implementation of this method, but it is OK to implement a method which just raises an exception - but the code smell is (to me) in the design of the
importlib.metadataAPI, not in pip's usage of that API.
I think I agree with this assessment. My aim was to strongly encourage custom providers to provide this interface, but stop short of requiring it in order to leave the door open for cases where providing it was unnecessary for a narrow use-case or impossible or too onerous for the underlying model to implement. pip seems to falls into the "unnecessary for a narrow use-case" while the "DatabaseDistribution" falls more into the "too onerous to implement".
An alternative approach could be to provide a default implementation of .locate_file that raises a NotImplementedError and leave it to providers to encounter that error and realize they wish to provide an implementation. If you think that would be a better approach, I'd be open to it, but thusfar no one has proposed it. I'd certainly be amenable to such a change if that provides a better experience for custom providers.
This warning annoyingly makes github to mark workflows with red cross indicating some failure. Example: https://github.com/Fak3/enjam/actions/runs/11346546713
How GitHub decides to select and present error/warning messages from CI logs isn't in our control unfortunately.