packaging
packaging copied to clipboard
Add a method to simplify a SpecifierSet
It would be useful when reasoning about SpecifierSet
values to have a means of simplifying them. For example, SpecifierSet("<1.0, <2.0")
is equivalent to SpecifierSet("<1.0")
. For example, knowing that a complex specifier set is guaranteed to be empty can be useful when resolving requirements.
I have a prototype implementation which currently only merges multiple "<", "<=", ">" and ">=" specifiers, but which could be extended to simplify uses of other specifiers. I'd be willing to develop this into a full implementation and contribute it if there was interest.
One technical issue which would need discussion is around pre-release specifiers. The specifier set >1.0a1, >2.0
is equivalent to >2.0
. However, the former "explicitly mentions" a prerelease version, whereas the latter does not. I'm not 100% sure what the packaging library's policy is on whether this matters when checking if a version satisfies a specifier. There are a number of options for this case - simplify anyway, refuse to simplify, or simplify and return a flag saying that an explicit pre-release version was eliminated in the simplification.
FYI, I also have a use case for this.
I have an idea to inspect specifiers to help optimize resolving requirements. Simplifying specifiers first would reduce complexity, and having unambiguous answers to the questions raised here would save having to make guesses on how they should behave.
Yes, this was prompted by https://github.com/pypa/packaging/issues/760, because a specifier like >1.0, <1.0
is empty, but the check suggested in that issue wouldn't identify that. Simplifying and then checking would.
At the moment, I'm assuming a separate method, simplified = spec.simplify()
, but maybe there's a case for maintaining specifier sets in simplified form at all times? The only methods that create specifier sets are the constructor and __and__
, and it would be easy enough to simplify as a final step in each of those.
I'm not 100% sure what the packaging library's policy is on whether this matters when checking if a version satisfies a specifier.
OK, so I tested:
>>> SpecifierSet(">=1.0.0a1,>1.1.0").contains("1.3.0a1")
True
>>> SpecifierSet(">1.1.0").contains("1.3.0a1")
False
>>> SpecifierSet(">1.1.0",prereleases=True).contains("1.3.0a1")
True
To be correct, it looks like SpecifierSet(">=1.0.0a1,>1.1.0")
would have to simplify to SpecifierSet(">1.1.0", prereleases=True)
. But I'm a little concerned that the default handling of prereleases
(when it's not specified, or None
, in the constructor arguments) doesn't behave as I expected:
>>> SpecifierSet(">1.1.0a1").prereleases
False
I'd have been perfectly happy with either True
or None
here - but False
just seems wrong to me. Can one of the maintainers explain the logic here?
Can one of the maintainers explain the logic here?
It appears that the logic is here - only inclusive operators count as "explicitly requesting" a pre-release, so >1.1.0a1
does not match pre-releases. I can't say that I particularly agree with this interpretation, but it's deliberate and explicit, so I guess I'll need to follow it when simplifying.
Yes, this was prompted by https://github.com/pypa/packaging/issues/760, because a specifier like >1.0, <1.0 is empty, but the check suggested in that issue wouldn't identify that. Simplifying and then checking would.
Wait. It appears that I misunderstood what "empty" meant in that issue. SpecifierSet("")
is empty in the sense that its length is 0. But this means that it contains every version (it has no specifiers that limit the allowed versions). What I was thinking was that it was empty because it contained no versions.
I can't actually think of a way to write a specifier that is empty in the sense that it contains no versions, that is simpler than <v,>v
(for any version v). Is that correct?
I can't actually think of a way to write a specifier that is empty in the sense that it contains no versions, that is simpler than
<v,>v
(for any version v). Is that correct?
Correct, I had a think about that in this issue: https://github.com/pypa/packaging/issues/762.
You probably got this misunderstanding from me, as I had that misunderstanding when I wrote https://github.com/pypa/packaging/issues/760. A “simplify” would still be very useful, i.e. ">2,>1" gets simplified to ">2".
But what is also needed, at least for my use case, is the ability to ask "for this Specifier Set is it completely contradictory? I.e. there are no possible versions that could match it", this would have to be a different method than "simplify" as the result is not expressible itself as a Specifier Set. Also this method may need to be careful with its return values, it probably needs to be explicit in having a "can not determine" value.
Apologies if this is outside the scope of what you were thinking.
I'd have been perfectly happy with either True or None here - but False just seems wrong to me. Can one of the maintainers explain the logic here?
SpecifierSet.prereleases
encapsulates the logic to handle both explicitly specified via kwarg and determined via inspecting the specifier.
I'm pretty sure the answer is just that when I wrote the code 10 years ago, I was trying to prevent !=1.0a1
from triggering the "should include the prerelease`, and so I included a check not just on what versions were mentioned in the specifier, but whether that specifier logically was "including" that version or whether it was "excluding" that version.
I think this makes general sense, !=1.0a1
is explicitly saying you don't want that version, so it not requesting prereleases, while ==1.0a1
is explicitly saying you do want that version, so it is requesting prereleases.
The >
and <
operators kind of sit in a weird place though, >1.0a1
is not requesting 1.0a1, it's requesting anything of a higher version than that, so it currently does not trigger the "user has requested prereleases" logic, but that creates a weird asymmetry with >=1.0a1
. To make matters worse, It's not possible to get a prerelease version that is less than 1.0a0.dev0
, so does <1.0a0.dev0
include or exclude prereleases?
I don't remember thinking to hard about it, and just did the ones that explicitly included the pre-release that was being mentioned, which meant that >
and <
were excluded from being considered triggering the "user has requested a prerelease" logic... but I dont' feel really strongly about that one way or another.
No, it's precisely in scope of what I'm thinking about. And the result is expressible as a specifier set (after all, ">0,<0" is precisely such a set) - it's just that there's no immediately obvious "canonical" answer. A simplifier could certainly just pick a value and consider it canonical (and reduce any other provably-unsatisfiable specifier sets to that value).
What's actually far more of an issue is the handling of prerelease versions, which I'm starting to think may actually make this impossible. Consider the following example:
>>> Version("0.dev0") in Specifier("<0")
False
>>> Version("0.dev0") in Specifier("<=0.dev0")
True
>>> Version("0.dev0") in SpecifierSet("<0,<=0.dev0")
False
Pretending for a moment that we don't realise that things are going to get messy because "0.dev0" is a pre-release, the above is perfectly reasonable. However:
>>> Version("0.dev0") < Version("0")
True
So, of the two upper bounds, "<=0.dev0" is the tighter. However:
>>> Version("0.dev0") in SpecifierSet("<=0.dev0")
True
The problem here is that "<0" is actually a tighter bound in the sense that it excludes pre-release versions even if they are <0. And setting the prereleases
flag on the specifier set doesn't address this (or at least, I can't prove to my satisfaction that it does) because the set would then behave differently (you can't combine it using &
with a set that says prereleases=True
for example).
I don't remember thinking to hard about it, and just did the ones that explicitly included the pre-release that was being mentioned, which meant that > and < were excluded from being considered triggering the "user has requested a prerelease" logic... but I dont' feel really strongly about that one way or another.
Yeah, I'm pretty sure the whole "prereleases" logic (here, in the spec, and in pip) was all based on an instinctive desire to have things "do what I mean", but failed to create something that was mathematically or logically consistent. Regrettably, though, I think we're probably stuck with what we have, short of a "Version specifiers 2.0" like you mentioned on Discourse...
Yea, in hindsight the logic probably should have been either simplified or pushed to the edges (e.g. in pip/poetry/etc), and left the core library and spec alone to implement the logically consistent thing.
I guess a fair question for the packaging
maintainers is whether they (you) would support a breaking change that did just that - made prereleases just like any other version, and left it to tools to implement the "do what I mean" semantics.
To be clear, I'm not even sure that's possible, but unless the maintainers have the stomach for that level of breakage, it's not worth putting effort into trying to find a workable approach. (Personally, I expect the answer to be "no, this is too much for too little benefit" - I know I wouldn't be happy trying to deal with the knock-on effects of such a change in pip).
Brett and Pradyun's opinions should weigh much higher than mine, since they've been the primary maintainers for awhile now.
Personally I think it's something we should do, but I have a bit of a preference for coupling it to the hypothetical PEP 440v2, but if someone felt strongly about getting that particular change out and figured out a good way to manage the breakage, I wouldn't (personally) be super upset about that.
But again, I'd consider my opinion to be less of a maintainer of packaging at this point and more just a guy whose been around for awhile :)
The problem here is that "<0" is actually a tighter bound in the sense that it excludes pre-release versions even if they are <0. And setting the
prereleases
flag on the specifier set doesn't address this (or at least, I can't prove to my satisfaction that it does) because the set would then behave differently (you can't combine it using&
with a set that saysprereleases=True
for example).
Yes, I arrived at similar concerns through different means. For example, I was worried that for something like ">0,<0," there might actually be versions that would pass this Specifier Set if one of the constructing Specifiers was prereleases=True
(it seems from this discussion that's not the case in this example).
I still think it's possible to determine in many cases if a Specifier Set is contradictory, e.g., ">2,<1", in this case, it doesn't matter if one of them is a pre-release. But, yeah, the boundary conditions are particularly difficult, exponentially so because of pre-releases.
I had an idea of writing a testing mechanism that generates all possible versions around a boundary version (e.g., 1.0), up to some topological equivalence. Then, it creates all possible Specifiers around that boundary version and checks what versions are contained in that specifier. This can be used to test various things, like if Specifier Sets are logically consistent with their underlying Specifiers, and to test a function that determines if a Specifier Set is completely exclusionary or not. I have this mechanism mostly working, but there were some nuances I hadn't ironed out which this thread, I think, should immensely help with when I have a moment to work on it again.
FYI I've made a PR (https://github.com/pypa/packaging/pull/794) to follow the spec, which will also fix this imbalance.