pip icon indicating copy to clipboard operation
pip copied to clipboard

Add a resolver option to use the specified minimum version for a dependency

Open dhellmann opened this issue 4 years ago • 76 comments

What's the problem this feature will solve?

I would like to be able to install my project using its "lower bounds" requirements and run the test suite to ensure that (a) I have those lower bounds specified properly and (b) the tests pass.

Describe the solution you'd like

A new command line option --prefer-minimum-versions would change the resolver behavior to choose the earliest version supported by a requirement specification. For example if versions 1.0 and 2.0 of package foo are available and the specification is foo>=1.0 then when the flag is used version 1.0 would be installed and when the flag is not used version 2.0 would be installed.

Large applications such as OpenStack have a lot of dependencies and verifying the accuracy of the complete set is complex. Providing a way to install the earliest set of packages expected to work would make this easier.

Alternative Solutions

The existing constraints file option does help, but building a valid constraints file is complicated.

Additional context

There is a recent discussion about this need on the openstack-discuss mailing list.

I will work on the implementation.

dhellmann avatar Apr 19 '20 17:04 dhellmann

I like this idea. Preferring a minimum version is a valid tactic in many scenarios, and is even the default in some package managers. I don’t think it’s a good idea to default to the lowest possible version, but it’s reasonable to have it as a configurable option.

The tricky part is how to expose the functionality to the user though. Having a separate --prefer-minimum-versions feels wrong to me, since the flag doese not make sense in some cases. Maybe this should be included as a part of the --upgrade-strategy redesign process. For example, introduce a new --strategy flag as the replacement, and have this as one of the possible values.

uranusjr avatar Apr 20 '20 09:04 uranusjr

I don't see the upgrade_strategy argument to the new resolver being used at all. Is that part of other work that someone else is doing?

It does seem to make sense to fold the behavior change into the strategy, as long as it isn't something that we would want to combine with other strategies. It looks like other strategies include only-if-needed, eager, and to-satisfy-only. I'm not sure what the distinction is between only-if-needed and to-satisfy-only. During an upgrade, I could see someone wanting to say the equivalent of "update if you have to, but move to the oldest possible version you can". How would someone express that if "prefer-minimum" is a separate strategy from those other options?

dhellmann avatar Apr 22 '20 14:04 dhellmann

I don't see the upgrade_strategy argument to the new resolver being used at all.

It isn't yet. But we expect to add that soon (subject to some questions over how well the existing strategies fit with how the new resolver works).

In terms of the new resolver, "eager" means "don't prioritise already-installed versions over other versions". And "only-if-needed" would prioritise already-installed versions. The "to-satisfy-only" option isn't really relevant as it's more of an "internal" state (its behaviour is a bit weird, so I won't confuse things by explaining here).

Minimum version would be easy enough to specify by preferring older versions over newer ones.

The big question, as I see it, is how to let the user specify their intent correctly. Suppose there's some dependency in the tree that doesn't specify a minimum version. Would you want to install version 0.0.1 (or whatever ancient version) in that case? And surely "upgrade to minimum version possible" is just "don't upgrade" - the currently installed version is pretty much by definition the minimum version allowed...

So I think that technically, this is relatively straightforward to implement, but we'd need help in designing a user interface, in terms of command line options, to allow the user to make meaningful requests, while not turning things into a complex mess that no-one can understand :-)

pfmoore avatar Apr 22 '20 15:04 pfmoore

I don't see the upgrade_strategy argument to the new resolver being used at all.

It isn't yet. But we expect to add that soon (subject to some questions over how well the existing strategies fit with how the new resolver works).

OK. I was asking because I wasn't sure how to fit this in. I can add a --strategy option to replace the --upgrade-stragey option as @uranusjr suggested, and ensure the strategy is passed to the resolver as upgrade_strategy. After that, I'm less sure what to do. :-)

Is the plan to define some classes to represent the behaviors of the strategy so that the code can call methods instead of checking string literals in different places? Or do you think the strategies would be completely encompassed by the resolver itself, so the string literals would be fine? Either way, I expect we would need some changes in resolverlib, too. How much of the definition of the strategies should be owned by the library instead of pip itself?

If there's anything written down that I can look at to come up to speed, feel free to respond just with links. I have a little time this week so I'd like to help, if I can.

In terms of the new resolver, "eager" means "don't prioritise already-installed versions over other versions". And "only-if-needed" would prioritise already-installed versions. The "to-satisfy-only" option isn't really relevant as it's more of an "internal" state (its behaviour is a bit weird, so I won't confuse things by explaining here).

Minimum version would be easy enough to specify by preferring older versions over newer ones.

The big question, as I see it, is how to let the user specify their intent correctly. Suppose there's some dependency in the tree that doesn't specify a minimum version. Would you want to install version 0.0.1 (or whatever ancient version) in that case? And surely "upgrade to minimum version possible" is just "don't upgrade" - the currently installed version is pretty much by definition the minimum version allowed...

I would say, yes, install 0.0.1. I consider not specifying a minimum version a bug in the packaging specs, and if 0.0.1 doesn't work then the tests run with this new flag set would expose the bug. I realize other folks may not have quite that strict an interpretation, though. :-) I guess saying that the strategy would install the "earliest version that can be found" in that case would at least be clear and easy to understand. Maybe that means a better name for the strategy is something like "earliest-compatible"?

So I think that technically, this is relatively straightforward to implement, but we'd need help in designing a user interface, in terms of command line options, to allow the user to make meaningful requests, while not turning things into a complex mess that no-one can understand :-)

I agree, the implementation in #8086 was quite straightforward, and the harder part will be the UI and internal API changes.

dhellmann avatar Apr 22 '20 15:04 dhellmann

I've joined #pypa-dev on freenode as dhellmann, in case anyone wants to chat about this with less latency. I can summarize anything said there here in the ticket for easier reference later.

dhellmann avatar Apr 22 '20 15:04 dhellmann

Is the plan to define some classes to represent the behaviors of the strategy so that the code can call methods instead of checking string literals in different places? Or do you think the strategies would be completely encompassed by the resolver itself, so the string literals would be fine? Either way, I expect we would need some changes in resolverlib, too. How much of the definition of the strategies should be owned by the library instead of pip itself?

As an example of what I mean here, I could see a Strategy class hierarchy defining a method get_preferred_candidate() to implement the PipProvider method get_preference() so the provider doesn't have to be aware of all of the strategies. The Strategy would also need to define a method like sort_candidates() to be used by resolvelib.Resolution._attempt_to_pin_criterion().

I'm sure other strategies would cause the API for Strategy to need to expand in other ways.

dhellmann avatar Apr 22 '20 16:04 dhellmann

I've joined #pypa-dev on freenode as dhellmann, in case anyone wants to chat about this with less latency.

We're discussing resolver things on Zulip rather than IRC.

As an example of what I mean here, I could see a Strategy class hierarchy defining a method get_preferred_candidate() to implement the PipProvider method get_preference() so the provider doesn't have to be aware of all of the strategies.

The get_preference method isn't related to this. It's a "which thing should we check next" tuning knob to control the internal progress of the resolver. The method that matters here is find_matches (and specifically the order of the candidates it returns).

I'm planning on looking at this myself tomorrow, as I've had upgrade strategies on my task list for a week or so now :-) At the moment, I'm a fairly strong -1 on strategy classes - I feel that they'd likely just be over-engineering at the moment. IMO we've already got probably more classes in the new resolver code than we really need...

But I've shut down my "working on pip" PC for the day now, so I'll refrain from going into any further detail just from memory.

pfmoore avatar Apr 22 '20 16:04 pfmoore

I've joined #pypa-dev on freenode as dhellmann, in case anyone wants to chat about this with less latency.

We're discussing resolver things on Zulip rather than IRC.

Ah. I don't know what that is. The docs pointed me to IRC. How do I get to the right place in Zulip?

As an example of what I mean here, I could see a Strategy class hierarchy defining a method get_preferred_candidate() to implement the PipProvider method get_preference() so the provider doesn't have to be aware of all of the strategies.

The get_preference method isn't related to this. It's a "which thing should we check next" tuning knob to control the internal progress of the resolver. The method that matters here is find_matches (and specifically the order of the candidates it returns).

OK. That wasn't what I found when looking at the implementation I've already one, but I'll take a look at find_matches().

I'm planning on looking at this myself tomorrow, as I've had upgrade strategies on my task list for a week or so now :-) At the moment, I'm a fairly strong -1 on strategy classes - I feel that they'd likely just be over-engineering at the moment. IMO we've already got probably more classes in the new resolver code than we really need...

OK, I can understand that. There do seem to be a lot of different parts working together and https://github.com/dhellmann/pip/commit/de6e70d3abcc5638777e4ce169b4061b9c75ac18 didn't come out particularly clean. :-)

dhellmann avatar Apr 22 '20 17:04 dhellmann

I've joined #pypa-dev on freenode as dhellmann, in case anyone wants to chat about this with less latency.

We're discussing resolver things on Zulip rather than IRC.

Ah. I don't know what that is. The docs pointed me to IRC. How do I get to the right place in Zulip?

Nevermind, found it.

dhellmann avatar Apr 22 '20 17:04 dhellmann

@dhellmann Glad to hear that you're willing to help out with the implementation. ^>^

To set expectations early, this is a feature request for adding new functionality to pip. As per pip's release cadence, the next release with new features would be in July (pip 20.2) so arguably, IMO there's no hurry toward implementing this.

Further, as you've discovered, implementing this feature will be significantly easier to do with the new resolver's architecture than with the old resolver, however, our priority currently is to get the new resolver to feature parity with the existing resolver and roll it out to become the default this year. IMO implementing new features related to dependency resolution is going to be significantly lower priority for us, in the short term, while we work on replacing a core component of pip.

pradyunsg avatar Apr 22 '20 18:04 pradyunsg

I understand the priorities, and am not in a particular hurry for a release. That said, I have more time to work on pip in the next few days than I’m likely to have later. So let’s see where we get with things as you have time, too.

Are any of the higher priority tasks things I might be able to help with?

dhellmann avatar Apr 22 '20 18:04 dhellmann

This is pointing in a different direction from dependency resolution, but https://github.com/pypa/pip/issues/4625 would be great to solve and would be a significant usability improvement for users (especially ones that are on Linux and using the system Python with sudo).

pradyunsg avatar Apr 22 '20 20:04 pradyunsg

I would be very interested in this feature. Any updates on its implementation status?

pohlt avatar Oct 05 '21 16:10 pohlt

Basically, no-one is currently working on it, and the discussion in this thread is all there is. The biggest questions remain how to design a user-friendly interface for this, and make the behaviour intuitive for people (for example, I'm still not at all sure that if no lower bound is specified somewhere in the dependency tree, getting a version from 15 years ago and progressively working forward through the versions until you reach something that works, is a good user experience).

But nothing will happen unless someone is willing to do the design and implementation work, so any such discussions are pointless at the moment.

pfmoore avatar Oct 05 '21 17:10 pfmoore

Thanks for the update. There was an initial PR #8086 from @dhellmann which didn't get a lot of attention or feedback from the package owners. Busy times, I know. 😉

So without at least some commitment from the package owners, nobody will go down the same road and starve again, I guess. Well, at least I wouldn't.

If no lower bound is given, just try the 15 years old version and watch how everything goes up in flames. I don't think that's a likely scenario. Anyone who knows about the "minimum version" option and activates it, will be clever enough to know that a missing lower bound is bound to break. Or you could simply stop and tell the user to add a minimum version.

pohlt avatar Oct 05 '21 18:10 pohlt

I think the discussion in this thread already showed the maintainers do not object to the idea at all. But going forward with an actual implementation, it needs to be first discussed to resolve the design decisions. An implementation without that design discussion is destined to wilt, because the implementation has no way to be accepted without some kind of consensus, no reviewers would spend volunteer time reading code that is likely going to be thrown away. If you want to drive the feature forward, you need to consider the design issues raised in this thread and come up with a piece to explain what you have in mind, and more importantly, why you feel that is the correct design for the problem at hand, and then you will get the "committment" you are looking for.

uranusjr avatar Oct 05 '21 18:10 uranusjr

Could you please elaborate on what you don't like about #8086? To me, without any knowledge about the overall design philosophy of pip, it looks like a clean and minimal PR lacking test coverage.

pohlt avatar Oct 07 '21 06:10 pohlt

I've already said that I dislike blindly getting the oldest version when there's no lower bound specified. I haven't had time to research the precise details of the sort of failure case I'm imagining, but consider a set of requirements that, somewhere 5 or 6 levels down in the dependency tree, says something like numpy != 1.21.1. And you're on Python 3.9 but have no C compiler. Then pip will try to build about 90 source releases of numpy versions that have no Python 3.9 wheels, before finding the oldest version with a wheel. That's going to be horribly slow - and because the dependency is way down in the tree, may not be easily fixable (or even identifiable) by the user.

I think any solution should deal with situations like this reasonably cleanly, but I don't know how that would work. We've had enough complaints that the standard resolve ordering results in long install times where the user can't work out what's taking the time (when it's relatively "obvious" to people with a lot of experience with the resolver) to make me think that this is not going to be as rare a situation as you hope it will be...

pfmoore avatar Oct 07 '21 08:10 pfmoore

Just omitting an warning message while building that a depednency lacks a specified minimum version is maybe enough? Or even maybe making it an hard error forcing the user to specify the minimum version themselves? I'm fine with the resolver just quitting with a hard error instead of trying too hard for this mode.

thomasf avatar Oct 07 '21 08:10 thomasf

As I have proposed above, pip could simply refuse to run (or stop execution) if no lower bound is given for any requirement and it is being run in the --minimum-version mode. Of course, a good warning/error message would be appreciated.

For me, this requested feature actually is about trying to break things in the sense that I want to test my minimum requirements.

It could also make sense to restrict this "minimum version" rule to direct dependencies (i.e. not to sub-dependencies). This would also mitigate the problem that a sub-dependency has no lower bound and would stop pip (see proposal above) or make it horribly slow..

pohlt avatar Oct 07 '21 08:10 pohlt

I think just outright refuse to run if any of the requirements misses a lower bound is reasonable (not necessarily user-friendly since that restriction will also apply for all transitive packages, but that's something we can build tooling around). The only thing I don't like is the --prefer-minimum-versions flag, also mentioned above.

uranusjr avatar Oct 08 '21 04:10 uranusjr

Ok, so let me summarize:

  • The PR in general is fine.
  • You don't like --prefer-minimum-versions. Proposals would be highly appreciated.
  • Still missing:
    • Tests
    • Stop execution if lower bound missing

The more I think about it, the more I like the idea of applying this "minimum version" rule only to direct dependencies. If applied for all transitive packages, I'm also testing their reasonable choice for a lower bound, which is beyond the scope of my tests. What do you think?

pohlt avatar Oct 08 '21 06:10 pohlt

The more I think about it, the more I like the idea of applying this "minimum version" rule only to direct dependencies. If applied for all transitive packages, I'm also testing their reasonable choice for a lower bound, which is beyond the scope of my tests. What do you think?

I am not sure I fully understand what this means.

My reason for wanting this feature would be to be able to have a single requirements.txt for an application that would install some some form of predictability given the same os/environment without additional tools and lock/freeze files. ( my reasons outlined here https://github.com/pypa/pip/issues/10207#issue-952767638 )

I'm not sure that having different versioning rules at different dependency dephts would be surprising (which means not good) and probably hard to understand for some users. Dependency resolvers are hard for many users as it is without intentionally making them even more complicated.

thomasf avatar Oct 08 '21 07:10 thomasf

The more I think about it, the more I like the idea of applying this "minimum version" rule only to direct dependencies.

I'm not sure what you mean by "direct" dependencies. Requirements stated on the command line and/or requirements file? Requirements declared in their dependency metadata? Both? This seems like a weird rule - it means that if you copy a requirement from deeper in the dependency tree and add it to the command line (something we occasionally advise people to do to address complex backtracking issues) that would radically change what gets installed. IMO, that's flat-out wrong (writing requirements in a different order may change performance, but shouldn't change the end result).

pfmoore avatar Oct 08 '21 07:10 pfmoore

I'm using the definitions here.

My user story: As a package developer, I want to make sure that the lower bounds of the packages I define as direct dependencies (in setup.py or pyproject.toml) make sense, i.e., that my package passes all the tests with the lowest version of dependent packages installed. [Side remark: I cannot directly influence transitive dependencies other than making them direct dependencies.] My golden rule for the selection of a lower bound for a direct dependency is "as low as possible, as high as necessary" to give users of my package the most flexibility. The requested feature would allow for an easy test of these lower bounds.

This leaves room for interpretation. For instance, it does not define which version for transitive dependencies should be chosen. The initial discussion and the PR were assuming that all packages (direct and transitive) should be the lowest possible version. What I am proposing now (and maybe it doesn't make sense) is to use the standard strategy (highest available version) for transitive dependencies, because I cannot directly fix the lower bounds for transitive dependencies.

pohlt avatar Oct 08 '21 09:10 pohlt

I cannot directly influence transitive dependencies other than making them direct dependencies

You can use a constraints files ( https://pip.pypa.io/en/stable/user_guide/#constraints-files ) which probably are what I would want to use to control upgrades of transitive dependencies in general when minimal version selection is used in a project.

thomasf avatar Oct 08 '21 09:10 thomasf

What I meant with "influence": If a direct dependency of my package messed up their lower bounds or didn't define lower bounds at all, I cannot raise their lower bounds (literally changing their setup.py). I can make my tests work again by using constraint files, as you mentioned, but pip could also just use the regular strategy for transitive packages.

This is off-topic, but if pip had a well-documented API, it could be rather simple to influence its inner workings (like version selection strategy). Maybe there is and I just couldn't find it.

pohlt avatar Oct 08 '21 10:10 pohlt

This is off-topic, but if pip had a well-documented API

I sort of agree (but not in the way you mean, I suspect 😉) The use case you seem to be describing sounds like you want pretty fine control over a lot of what pip is doing, to make sure you test what you're trying to test. That's a scenario that's not well served by a massive monolithic program like pip. What you really need (IMO) is a set of smaller tools and/or libraries that let you compose the mechanism you want from individual well-tested pieces.

Basically, all of the standards work we've been doing for years is intended to try to enable that sort of separation of concerns. It's far from complete, but many of the parts are in place. What hasn't happened yet, is for an ecosystem of libraries to get developed around that (it's happening, with things like packaging, build and installer, but there are still key parts that no-one has addressed yet, or which only exist in "proof of concept" form). But there's no way pip is ever going to be that sort of library - it was designed as a monolithic application, and the internals are not suitable for exposing as a library (and even if they were, we don't have the manpower to even consider putting everything else on hold for long enough to rewrite everything as reusable APIs).

So unfortunately, until more people start writing the pieces needed to build something like pip from reusable components, you won't be able to influence the resolve/install process to the level you want to. (And yes, as such libraries come into existence, pip will likely switch to using them rather than having to maintain our own implementations).

pfmoore avatar Oct 08 '21 11:10 pfmoore

It deeply concerns me that with all the funding the PSF gets, the PyPA still seems to be understaffed/underfunded. Packaging is such a central concerns for any programming language that it should not rest mainly on the shoulders of volunteers like you. I don't know any details if and how much PSF is supporting PyPA, so my concerns might be completely unjustified.

If I understood you correctly, you think "my" (actually raised by Doug) use case is too specific to make it into pip. Is that correct? No hard feelings, I'm just trying to avoid to spend even more time on a lost cause.

pohlt avatar Oct 08 '21 11:10 pohlt

It deeply concerns me that with all the funding the PSF gets, the PyPA still seems to be understaffed/underfunded. Packaging is such a central concerns for any programming language that it should not rest mainly on the shoulders of volunteers like you. I don't know any details if and how much PSF is supporting PyPA, so my concerns might be completely unjustified.

Well, as far as I know, more $$$ from the PSF has gone toward directly funding packaging-related projects than CPython itself -- mostly through the PSF acting as a fiscal entity for targetted grants and the work done by the Packaging-WG of the PSF.

From my understanding, the problem isn't that the PSF won't direct funds toward packaging projects. Rather, it is that there simply isn't a lot of funding that the PSF gets, to direct toward software development (especially given the impact that PSF's other investments, like the grants programme, are able to achieve). This is something that many people are working toward improving and I'm optimistic that things will get better over time. In fact, right now, there's two funded-via-targetted-sponsorship roles: A Developer-in-residence for CPython and Packaging Project Manager (the later is sponsored by my employer). :)

The PSF's sponsorship programme is one way to support the PSF's ongoing endevours on these fronts. If you're focused on packaging, there's the Packaging-WG of the PSF who are more than happy to talk about funding for Python Packaging improvements. :)

pradyunsg avatar Oct 08 '21 11:10 pradyunsg