pip icon indicating copy to clipboard operation
pip copied to clipboard

WheelDistribution doesn't implement abstract methods of importlib.metadata.Distribution

Open jaraco opened this issue 2 years ago • 9 comments

Description

In https://github.com/python/cpython/pull/100466, we learned that pip implements a WheelDistribution derived from importlib.metadata.Distribution, but doesn't implement the abstract methods. This usage was allowed because the ABC didn't derive from abc.ABCMeta. I don't recall if that choice was intentional or not, but attempting to enforce the definition of abstract methods in subclasses breaks pip invocations.

Expected behavior

Pip should implement the abstract methods (even if just trivially).

pip version

22.3.1

Python version

3.12a3ish

OS

all

How to Reproduce

Apply the patch from the referenced Python PR to the Python stdlib and invoke pip install.

Output

No response

Code of Conduct

jaraco avatar Jan 01 '23 16:01 jaraco

It should be noted that partial implementation of the Distribution interface is now deprecated: https://github.com/python/importlib_metadata/pull/451

ichard26 avatar Apr 25 '24 19:04 ichard26

Now that the behavior is in Python 3.12, the repro is:

 draft 🐚 py -m venv .venv
 draft 🐚 py -W error::DeprecationWarning:pip._internal.metadata.importlib._dists -m pip install sampleproject
Collecting sampleproject
  Using cached sampleproject-3.0.0-py3-none-any.whl.metadata (4.4 kB)
Collecting peppercorn (from sampleproject)
  Using cached peppercorn-0.6-py3-none-any.whl.metadata (3.4 kB)
Using cached sampleproject-3.0.0-py3-none-any.whl (4.7 kB)
Using cached peppercorn-0.6-py3-none-any.whl (4.8 kB)
ERROR: Exception:
Traceback (most recent call last):
  File "/Users/jaraco/.local/pip-run/pip/_internal/cli/base_command.py", line 179, in exc_logging_wrapper
    status = run_func(*args)
  File "/Users/jaraco/.local/pip-run/pip/_internal/cli/req_command.py", line 67, in wrapper
    return func(self, options, args)
  File "/Users/jaraco/.local/pip-run/pip/_internal/commands/install.py", line 377, in run
    requirement_set = resolver.resolve(
        reqs, check_supported_wheels=not options.target_dir
    )
  File "/Users/jaraco/.local/pip-run/pip/_internal/resolution/resolvelib/resolver.py", line 179, in resolve
    self.factory.preparer.prepare_linked_requirements_more(reqs)
    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^
  File "/Users/jaraco/.local/pip-run/pip/_internal/operations/prepare.py", line 554, in prepare_linked_requirements_more
    self._complete_partial_requirements(
    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^
        partially_downloaded_reqs,
        ^^^^^^^^^^^^^^^^^^^^^^^^^^
        parallel_builds=parallel_builds,
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    )
    ^
  File "/Users/jaraco/.local/pip-run/pip/_internal/operations/prepare.py", line 489, in _complete_partial_requirements
    self._prepare_linked_requirement(req, parallel_builds)
    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/jaraco/.local/pip-run/pip/_internal/operations/prepare.py", line 642, in _prepare_linked_requirement
    dist = _get_prepared_distribution(
        req,
    ...<3 lines>...
        self.check_build_deps,
    )
  File "/Users/jaraco/.local/pip-run/pip/_internal/operations/prepare.py", line 75, in _get_prepared_distribution
    return abstract_dist.get_metadata_distribution()
           ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^
  File "/Users/jaraco/.local/pip-run/pip/_internal/distributions/wheel.py", line 34, in get_metadata_distribution
    return get_wheel_distribution(wheel, canonicalize_name(self.req.name))
  File "/Users/jaraco/.local/pip-run/pip/_internal/metadata/__init__.py", line 106, in get_wheel_distribution
    return select_backend().Distribution.from_wheel(wheel, canonical_name)
           ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/jaraco/.local/pip-run/pip/_internal/metadata/importlib/_dists.py", line 133, in from_wheel
    dist = WheelDistribution.from_zipfile(zf, name, wheel.location)
  File "/Users/jaraco/.local/pip-run/pip/_internal/metadata/importlib/_dists.py", line 73, in from_zipfile
    return cls(files, info_location)
  File "/opt/python/lib/python3.13/importlib/metadata/__init__.py", line 344, in __new__
    warnings.warn(
    ~~~~~~~~~~~~~^
        f"Unimplemented abstract methods {abstract}",
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
        DeprecationWarning,
        ^^^^^^^^^^^^^^^^^^^
        stacklevel=2,
        ^^^^^^^^^^^^^
    )
    ^
DeprecationWarning: Unimplemented abstract methods {'locate_file'}

And we can probably expect pip to start failing in Python 3.14.

jaraco avatar Jul 25 '24 20:07 jaraco

So I've taken a look at the code. I see now that pip implements the WheelDistribution because importlib.metadata.PathDistribution holds the zipfile.Path open too long for pip's needs.

I wonder if a better way to deal with the situation is to eliminate this concern (and the need for WheelDistribution altogether). Did you know that zipfile.Path can open a zipfile in memory? How much pain would users notice if pip were to load the whole wheel into memory for the life of the Distribution object? If that's too much overhead, one could simply construct a metadata-only wheel (in-memory) and pass that to PathDistribution, effectively implementing the same behavior that's there currently, but without creating a whole new WheelDistribution object.

jaraco avatar Jul 25 '24 20:07 jaraco

I'm going to push back on this because you (or more accurately the importlib.metadata docs) haven't explained why our implementation isn't acceptable. It may well be that we could do as you say, but that's new, fragile, code that would need writing and maintaining, and surely the whole point of allowing user implementations of distribution objects is so that users can implement custom behaviours - if the docs don't explain the limitations on doing so, the docs should be fixed first, so that users can be sure that any reimplementation required of them will be supported going forward, and won't be subject to future unexpected breakages like this one.

pfmoore avatar Jul 25 '24 20:07 pfmoore

the importlib.metadata docs) haven't explained why our implementation isn't acceptable

The trivial "why" is because it's an abstract method, so according to OO rules, subclasses need to implement it (even if there's no good justification for requiring it). Admittedly, that's a pretty weak rationale.

I reviewed the latest extending docs, which do include an example that implements locate_file raising a NotImplementedError, but they don't spend any time explaining why it's implemented or under what circumstances it could be implemented.

I expanded the docstring for Distribution.locate_file to elaborate on the rationale - why does that method exist and what happens when it's not implemented.

Does that clarify? Would you like to see more explanation in the Extending docs as well?

jaraco avatar Jul 25 '24 21:07 jaraco

It may well be that we could do as you say, but that's new, fragile, code that would need writing and maintaining

In 93f71fa, I've started exploring what the change would look like. At first, I'm exploring simply loading any wheel that's not already in memory into memory. I'm optimistic that the approach, as it's only three lines of code replacing a large class, is promising to be much less to maintain.

jaraco avatar Jul 25 '24 21:07 jaraco

Does that clarify? Would you like to see more explanation in the Extending docs as well?

An explicit note in the narrative text about DatabaseDistribution would be helpful, that explains that because this distribution type doesn't implement a full filesystem, it's OK to raise an error from the locate_file method, although this will mean that the importlib.metadata.files function will fail if it encounters a DatabaseDistribution.

With that, along with the docstring content being reflected in the API docs (I assume that will happen, it's not sufficient IMO for it to only be in the source code), I think it's sufficient for our purposes.

In 93f71fa, I've started exploring what the change would look like. At first, I'm exploring simply loading any wheel that's not already in memory into memory.

I wouldn't bother. It sounds like we'd be better just raising an error from the method. We don't call files[^1], and pip's internal API isn't supported for use by 3rd party code, so there's no need to care about that scenario.

I think it's a code smell that it's not OK to omit the implementation of this method, but it is OK to implement a method which just raises an exception - but the code smell is (to me) in the design of the importlib.metadata API, not in pip's usage of that API.

[^1]: Someone should check that! I'm saying it on the basis of your assertion that files() needs the locate_file method and will fail without it. So I'm assuming that if we did call it, we'd have had test failures or user bug reports by now.

pfmoore avatar Jul 25 '24 21:07 pfmoore

I think it's a code smell that it's not OK to omit the implementation of this method, but it is OK to implement a method which just raises an exception - but the code smell is (to me) in the design of the importlib.metadata API, not in pip's usage of that API.

I think I agree with this assessment. My aim was to strongly encourage custom providers to provide this interface, but stop short of requiring it in order to leave the door open for cases where providing it was unnecessary for a narrow use-case or impossible or too onerous for the underlying model to implement. pip seems to falls into the "unnecessary for a narrow use-case" while the "DatabaseDistribution" falls more into the "too onerous to implement".

An alternative approach could be to provide a default implementation of .locate_file that raises a NotImplementedError and leave it to providers to encounter that error and realize they wish to provide an implementation. If you think that would be a better approach, I'd be open to it, but thusfar no one has proposed it. I'd certainly be amenable to such a change if that provides a better experience for custom providers.

jaraco avatar Jul 25 '24 22:07 jaraco

This warning annoyingly makes github to mark workflows with red cross indicating some failure. Example: https://github.com/Fak3/enjam/actions/runs/11346546713 ci

Fak3 avatar Oct 15 '24 13:10 Fak3

How GitHub decides to select and present error/warning messages from CI logs isn't in our control unfortunately.

ichard26 avatar Oct 26 '24 16:10 ichard26