haxelib icon indicating copy to clipboard operation
haxelib copied to clipboard

Adds SemVer Comparisons

Open onefifth opened this issue 10 years ago • 27 comments

Allows for SemVers to be compared against each other and adds support for SemVers with >, <, <=, >= and = modifiers to define open ended version ranges. This could eventually be leveraged when defining and validating package dependencies versions in haxelib projects.

onefifth avatar Jul 25 '13 07:07 onefifth

We need to discuss exactly how we want to deal with the inequality modifiers before we pull this in. (see also)

Some questions we need to answer

What operators do we support?

Necessary
  • < - Great for when you know the first breaking change, if you know your app works with version 1, but not 2, provides an upper bound, but lets you include bugfixes to version one.
  • >= - Great for showing "this is the first version of the library that works with my app"
Maybe
  • = - Do we include equals, or make an unqualified version number mean equals? See section below
  • != - Only useful if we allow combining restrictions. Very limited use in general, only useful if the library is broken for one version, then returns to being okay.
  • ~> or ~ - This isn't included this pull request, but it's popular in some other dependency management systems: see ruby, ~>major.minor would be equivalent to >=major.minor.0 && <(major+1).0.0
  • > - Unnecessary, but could be included for completeness. If you know the last version of the lib that doesn't work, you will know the first version that does work, and a version could be added in between as a patch that doesn't work. (>1.0.5 is fine until there's a bugfix at 1.0.6, and 1.0.6 is incompatible with your app.
  • <= - Similar to above

Unqualified Versions

There's been some suggestion(@back2dos) to make an unqualified version correspond to ~>, instead of =. This makes it easy to use Semantic Versioning to our advantage. However, it might be confusing if you write version 1.0.5, and get something different.

AND/OR

It is logical to want to combine modifiers, to specify ranges. AND would be used to make a range from 2 or more inequalities, and OR could be used to combine ranges. If we decide to support AND/OR, we should decide how they are denoted:

  • AND: AND, &&, &, (space) (a space denotes AND in ruby)
  • OR: OR, ||, |

Dr-Emann avatar Jul 25 '13 14:07 Dr-Emann

My Vote:

  • Include <, >=, ~>, =
  • An unqualified version corresponds to =
  • Include AND as &&
  • Do not include OR
Reasoning
  • ~> represents the most common use case: "I know my app is compatible with this version, and will probably stay that way until the author makes a large change"
  • < and >= allow for the same constraint as ~>, but with more control. It is possible to specify that your app works with versions 1 through 3, but not 4 and above.
  • Using = for unqualified versions more closely reflects what the developer likely wanted. It would be too much of a surprise have it mean anything else
  • Using && for AND is consistent with Haxe's boolean AND operator. It is easy to split a string on && to get sub-strings that have Semantic Version Constraints.
  • OR is only useful when multiple ranges need to be combined. It is much more likely that an app's compatibility with a library will be contiguous, I don't think it is likely that an app will be compatible with versions 1 and 3, but not version 2.

Dr-Emann avatar Jul 25 '13 14:07 Dr-Emann

I feel the && and || operators (or some variant of) are outside the scope of a single semver and would extend into the scope of a compound one. Meaning the usage would be more like x.satisifies(y) && x.satisfies(z) rather than constructing a complex/compound semver? This would be implemented at a higher level when a list of versions is provided for comparison. Although I am 100% on board with their inclusion in some way for completeness sake. I know I have had dependancies requirements in the past that fit the case outlined @Dr-Emann's last reasoning point. It's not impossible for libraries to introduce a bug for a specific version or range of versions that breaks things.

I'm aware of the missing optional functionality surrounding ~ and ~> and honestly I just wanted to start some discussion/interest in some more contributions to this feature. In addition to these, my personal wishlist includes the ability to define versions like 1.2.x or 2.x allowing for fuzzier matching.

As for operator support, I was sticking closely to the official spec at semver.org and I think that deviating from the outline there would be a poor choice. As far as additional features I think leaning toward a functionality set similar to Node SemVer would be pretty reasonable.

This might be more appropriate to pull this into a branch to mature before merging down?

onefifth avatar Jul 25 '13 15:07 onefifth

Perhaps we should have two separate ideas: A semantic version is a single version, and a semantic constraint describes valid values for a semantic version. Semantic constraints are made of other semantic constraints: e.g >=1.2.3 && <2.0.0 would be a semantic constraint, made of two semantic constraints( >=1.2.3 and <2.0.0 ).

The constraint >=1.2.3 && <2.0.0 is satisfied by the semantic version 1.3.0. In code (chose whatever names and such you like)

SemVer.ofString("1.3.0").satisfies(SemVer.constraint(">=1.2.3 && <2.0.0"))

Dr-Emann avatar Jul 25 '13 17:07 Dr-Emann

We could also create something that acts specifically as a collection of semantic versions? At the very least we need some kind of data structure to store the version data and operator that is separate from the semantic version object. I think it makes sense to enable all instances of SemVer to potentially represent represent multiple semantic versions and instead represent a non-specific version, or as you've named it, a semantic constraint.

There are some additional complexities introduced if we need control over operator precedence for something like >=2.1.3 || (>1.9.8 && <=2.0.0 )

A semantic version and a semantic constraint should be handled identically in terms of comparison and possibly even creation. The only difference would be the way operations like incrementing would be handled on a semantic constraint. This seems like a reasonable approach to me although renaming SemVer or coming up with a different name for the data structure might be worthwhile.

onefifth avatar Jul 25 '13 18:07 onefifth

To me, they are disparate types, that can only occur where they belong (semantic versions only appear when a library declares its own version, semantic constraints appear in the dependencies). Constraints/rules can only be satisfied by a particular version.

It makes sense to compare a version to a version, but I'm not sure how you'd compare >1.0.0 and <=1.5.0. Is the first greater because it can be satisfied by a larger version, or is the second larger, because it starts at a larger version? In your code, you return false if you try to use

SemVer.ofString(">1.0.0").satisfies(SemVer.ofString("2.0.0")) // returns false

but really, this should be a type error: it doesn't make sense, a constraint doesn't satisfy a version, a version satisfies a constraint.

If you want to be able to define constraints like 1.x or the like, you almost have to split the two, also. 1.x is not a valid version, but it could be a valid constraint. The same is true of <=2.0.0: invalid version, valid constraint. (On a quick look over, it looks like you didn't change where we validate versions when submitting libraries, so I think it would be legal in your current pull request to have <=2.0.0 as a version. I could be wrong though, I didn't look too deeply)

Dr-Emann avatar Jul 25 '13 19:07 Dr-Emann

Right now, essentially the difference between a version and a constraint is defined by the result of isSpecific() which just tests for the presence of operator prefixes to the version.

I agree about satisfies throwing an error rather than false when called on a non-specific version. There's no defined way to check if one constraint satisfies another and I think that behaviour, even if explicitly defined, is unnecessarily complex.

Creating a new class SemVerConstraint and then making SemVer inherit from it might be ideal? Because all versions can be treated as constraints (assuming =) but not all constraints are versions. This would allow satisfies to exist only on versions while letting it operate on both constraints and versions.

I might attempt to do some of this work this evening.

onefifth avatar Jul 25 '13 20:07 onefifth

I'm pretty happy with this pull request overall once the question of operators is answered... I'd appreciate @back2dos on that as well before decisions are made.

Thanks heaps @onefifth, hopefully we can merge soon :)

jasononeil avatar Jul 26 '13 07:07 jasononeil

Is it weird to possibly merge this onto a new branch instead so it's more open to contributions? It would allow for contributions while keeping it out of master until it was improved/actually being used. If that's not really the way things are done that's understandable.

I would like to take a stab at adding some form of semantic constraints to support && and || operators joining versions in an intuitive manner before this is merged.

onefifth avatar Jul 26 '13 07:07 onefifth

I don't have commit access, but I have no problem with having a public branch.

Before it's merged, do you think you could pull out the first two commits? I think since it's still only on yours, though, it's the perfect time to do the cleanup, before it becomes public on HaxeFoundation's. If you're not sure how to do that, let me know.

The most natural representation of a constraint/comparison/rule to me would be an enum:

class SemVer
{
    ...
    public function satisfies(constraint:SemConstraint)
    {
        switch(constraint)
        {
            case And(left, right): return satisfies(left) && satisfies(right);
            case Gt(ver): return compare(ver) > 0;
            case ApproxEq(ver): return compare(ver) >= 0 && compare(new SemVer(ver.major + 1, 0, 0, null, null) < 0);
            ...
        }
    }
    ...
    static public function constraintOfString(str:String)
    { ... }
}
enum SemConstraint
{
    And(left:SemConstraint, right:SemConstraint);
    Or(left:SemConstraint, right:SemConstraint);
    Gt(ver:SemVer);
    Lt(ver:SemVer);
    Eq(ver:SemVer);
    ApproxEq(ver:SemVer);
    ...
}

Dr-Emann avatar Jul 26 '13 14:07 Dr-Emann

I rebased and dropped those 2 commits. Good call.

I really like the idea of using an enum for that. I'll make some time in the next couple days to give this a shot if we want to wait on this pull. I can close it and reopen when more ready if that's appropriate.

onefifth avatar Jul 26 '13 15:07 onefifth

I pushed a number of significant changes to add the suggested SemConstraint enum and to allow for the creation of complex constraints using the && and || operators. All semantic versions now have a constraint variable that will store the correct semantic constraint for their prefixed operator. No operator implies = or constraint Eq.

Things that are still not supported:
  • Operators including ~
  • Compound constraints using parentheses for operator ordering. (i.e. >=2.1.3 || (>1.9.8 && <=2.0.0))
  • Wild version constraints such as 1.2.x or 1.2.*

I updated the RegEx to correctly parse the operators (~, ~=, ~> and ~<), they are just considered invalid when the constraint construction is attempted. They should be simple to implement later when we decide on their behaviour.

I think the next step would be making the appropriate changes to the actual Haxelib code so that these comparisons are utilized when doing dependancy checks/version verifications. Ideally all current usage should remain identical and this will just expand on the core feature-set. I'm fairly comfortable leaving out any ~ operators for now rather than changing their potential implementation later.

onefifth avatar Jul 28 '13 05:07 onefifth

I still don't really like that SemVer.ofString(">1.2.3") doesn't complain that ">1.2.3" isn't a version. I also don't like that there's two ways to make a constraint from a string: SemVer.ofString(">1.2.3") or SemVer.constraintOfString(">1.2.3"), the former of which is almost never useful because it doesn't deal with compound constraints properly (in fact, it looks like it will happily throw away everything but the first constraint without warning. You can get things like SemVer.ofString("5.0.0").satisfies(SemVer.ofString(">1.2.3 && <2.0.0").constraint) == true) I think they should be totally separate. I'm fine with constructing a constraint using a static method on SemVer, but a semantic version shouldn't have an isSpecific method, or a comparator property. A version is a single version, a constraint describes a set of possible versions.

Dr-Emann avatar Aug 02 '13 19:08 Dr-Emann

Is there hope for a consensus here? This feature would be really nice to have.

Simn avatar Nov 27 '13 11:11 Simn

I don't feel like there's a tonne of work getting this to a state where it's considered mergeable, I agree with a lot of the things pointed out by @Dr-Emann above.

Essentially we want to stop allowing SemVers to be treated as Constraints (other than for the purpose of direct equality?) and be more strict on the creation of constraints.

I may attempt to touch some of this again tonight as I think I'll have a little time.

onefifth avatar Nov 28 '13 21:11 onefifth

Hey so, I took a stab at these changes and believe I addressed the issues brought up by @Dr-Emann I encourage people to look through my test coverage and suggest any necessary additions.

Note that this is still just the functionality, classes and tests required to integrate Semantic Versioning into Haxelib and none of the actual implementation work.

onefifth avatar Nov 29 '13 06:11 onefifth

This is all nice, but I think it approaches the problem from the wrong direction.

If we simply allow arbitrary constraints, the complexity of getting compatible dependencies is horrible, especially if we add wildcards to the mix. You start with a range for one lib, that matches N versions. For every of those versions you have M dependencies (in fact the number needn't be constant). Because for each of those M dependencies you're to have to pick K versions, making sure that each of the L dependency constraints they have are compatible. This basically means that we'll have to brute force through an exponential problem. It's also not made easier by the fact that in principle one should prefer the versions that are already installed client side.

And this better be really fast, since when the compiler calls haxelib path lib_foo:>=1.2.3 to get autocompletion, we want to be done instantaneously.

Now a good question is: how does npm do it?

  1. For one, npm is not global. It will install libraries relative to you project root (whatever contains the package.json). One single version per library. Once npm is done fetching the libraries, it really is done. You could just as well add stuff to ./node_modules by hand.
  2. Judging from node-semver, the strategy seems to be to get maximum versions for a given range. I'm not exactly sure this is a good idea. I can spin a lot of worse case scenarios, but I'm not really clear on how much we should care.

Either way, I think we need to solve this conceptionally, before adding code.

Since we can't even agree on using SemVer for the compiler, I'm not sure what assumptions are safe to be made when building such a system.

back2dos avatar Nov 29 '13 11:11 back2dos

I do agree with these points, but also I feel this is a step in a correct direction over the current implementation, which is to have absolutely no way of comparing library versions other than as exact strings.

It's also worth noting again, this pull request does not contain any of the implementation required to have haxelib actually use SemVers, rather just the tools to make the options available. Creating a local version list, caching dependancy trees on disk, where to store packages per project, are all not directly relevant to the contents of this request.

At the very least the ability to parse and compare library versions could be used to close #83 and provide some additional niceties, even if this approach is not deemed acceptable for general haxelib use.

onefifth avatar Nov 29 '13 16:11 onefifth

I think it's well past time we revisit this issue. Building off of this comment, I propose the following. These are fairly large changes to what is currently in this pull request, however, I think it's a good compromise between expressiveness and limiting arbitrary complexity.

Semantic Constraint

A semantic constraint defines a set of valid semantic versions. It can be described by an enum of 3 possible types: Exact, Range, and Or.

Semantic constraints must support at least the following operations:

  • Test if a semantic version v satisfies semantic constraint a
  • Intersection between semantic constraints a and b: returns a new semantic constraint which defines the set of versions that satisfy both a and b
    • What do we do if the set is empty? e.g. the intersection of =3.2.1 and =4.2.1. There are no versions which can satisfy both constraints. Should we return null? Throw?

Exact(SemVer)

An exact constraint constraint consists of an optional =, followed by a semantic version. The constraint =a is satisfied by version v iff a is equal to v

Examples

=3.2.1-alpha
3.2.1-alpha
=3.2.1
4.0.0

Range(SemVer, SemVer)

A range consists of a partial semantic version, followed by a required - (surrounded by white space), followed by a second partial semantic version. The first version when zero-extended is interpreted as the lower bound (inclusive), and the second version when zero-extended is the upper bound (exclusive). The constraint a - b is satisfied by version v iff a <= v < b is true. a MUST BE strictly less than b, or a - b is not a valid constraint.

A range can also be formed by appending a + to a partial semantic version. This range spans from version created by zero-extending the specified partial version, to the version created by incrementing the next highest section by one, and resetting all subsequent sections to zero (or, if there is no higher section (e.g. 3+), the upper bound is infinite).

E.g: 3.2.1+ is equivalent to 3.2.1 - 3.3.0, 3.2+ is equivalent to 3.2.0 - 4.0.0, and 3.2.1-alpha+ is equivalent to 3.2.1-alpha - 3.2.2. 3+ means all versions above 3.0.0

Notes

Ranges look weird if either of the versions contain dashes (3.2.1-alpha - 3.3.2). Consider alternative syntax:

  • [3.2.1-alpha, 3.3.2] Ranges are often shown with brackets. Whitespace not required.
  • 3.2.1-alpha ... 3.3.2 Looks like Haxe's range syntax for iterators. Still requires whitespace.

Examples

3.2.1-alpha - 3.3.5
3.2 - 5      # equivalent to '3.2.0 - 5.0.0'
3.2.1-alpha+ # equivalent to '3.2.1-alpha - 3.2.2'
3.2.1+       # equivalent to '3.2.1 - 3.2.2'
3.2+         # equivalent to '3.2.0 - 4.0.0'
3+

Or(SemanticConstraint, SemanticConstraint)

An 'or' constraint consists of a semantic constraint, followed by a required | (optionally surrounded by white space), followed by another semantic constraint. The constraint a | b is satisfied by version v iff a or b is satisfied by version v.

Examples

3.2.1-alpha | 3.3+
3.2.1-alpha - 3.3.5 | 4.2+
1.6.0 | 3.2.1-beta+ | 3.4+

Partial Semantic Version

A partial semantic version is formed by leaving off zero or more parts of a full semantic version from the right side. Every valid semantic version is also a valid partial semantic version.

Partial Semantic Versions must support at least the following operations:

  • zero-extension: Return a full semantic version, by replacing any missing parts with zero.

Examples

3.2.1-alpha  # zero-extends to 3.2.1-alpha
3.2.1        # zero-extends to 3.2.1
3.2          # zero-extends to 3.2.0
3            # zero-extends to 3.0.0

Dr-Emann avatar Apr 09 '14 06:04 Dr-Emann

I think it's important to look at this in the context of its application.

The whole idea is kind of inspired by npm. Unfortunately, haxelib is nothing like npm. Mostly because Haxe is nothing like JavaScript. In Haxe we have modules, that are considered global by the compiler. In JavaScript you just load other JavaScript from wherever you want to at runtime.

While haxelib has little choice to be global (at least in the scope of a single project), npm can afford to be local. With npm, every package is installed locally (in relation to the project) and has its own dependencies (installed locally in relation to the dependency). Therefore npm doesn't have to resolve conflicts between different packages having incompatible dependency ranges. Each package just loads its dependency separately.

As I said a while back, resolving such conflicts is anything but trivial. Say your project depends on lib_a, lib_b and lib_c directly, under the version constraints v_a, v_b and v_c. Say that also for some reason lib_a and lib_b depend on lib_c. For every version of lib_a that gives us a constraint we'll call v_a_c and we get a v_b_c analogously. Intersecting two constraints is trivial. However to determine incompatibility (which we should be able to do), we have to combine every v_a_c with every v_b_c to find no pair that when intersected with v_c leaves a non-empty set.

We have to do this, because we might have to expect that [email protected] depends on lib_c@[2.3.4-3.0.0] and then [email protected] depends on lib_c@[2.4.2-2.6.0] because in [email protected] some actually unspecified behavior needed to be changed and that breaks things in a way unforeseen at the time of [email protected].

Imagine you have 30 libs in a project, some as small and ubiquitous as promhx or msignal with 20 of the other libs depending on them, then this becomes a serious problem.

Unless somebody can come up with a solution (that would probably start with making the version information incrementally downloadable and solving the problem on the client side), we can't go there.

So I'm left with proposing once again that we embrace the semantics of SemVer, which is why to my knowledge we made the move in the first place. We can then make some optimistic assumptions that allow us finding a result rather quickly and fail otherwise (even though there may be some older version that has a wildcard dependency that would allow satisfying the constraints but would probably lead to nonsense anyway). In fact the haxelib ecosystem largely runs on very optimistic dependencies right now and already gets something done.

I suggest optimizing for the intended use case first and then we can start adding more of them, if we know why and how. To add support for optimistic dependencies, i.e. X.Y.Z - (X+1).0.0 (or X.Y.Z+ as you've also denoted it) is reasonable and I think the most important part here, while not too hard. It allows library authors to denote breaking changes through major releases. So I say:

  1. We don't allow arbitrary upper bounds for now, because that's contrary to SemVer semantics (as I said the next major version should be the upper bound).
  2. We don't allow "or" because for now we say that if you have a different major dependency, then you need a new major version.
  3. When people use that and give us actual feedback we can see about supporting more complex stuff. Or we might be able to agree as a community that these relatively restricted semantics actually get the job done for most people and the rest is ok with using wildcards.

If libraries are small enough, then this is absolutely ok. And if they were, we probably wouldn't have this whole NME vs. OpenFL debate but instead generally more reuse among community members. I think we should support such a culture. We can afford to, because it comes at no cost. People can still just use haxelib the way they do now, so we wouldn't be forcing anyone to do anything.

Of course if even we can't agree that this is the right way to go, then we must find something else. But then please do devise a scalable solution that covers how we're actually going to retrieve compatible libraries. Syntax is really the least of our problems.

back2dos avatar Apr 09 '14 08:04 back2dos

A couple of notes.

  1. Restricting to only opportunistic dependencies doesn't prevent us from having to walk all versions. If we require lib_a, lib_b and lib_c@[3.2.1 - 4.0.0], [email protected] might require [email protected] and [email protected] might require [email protected]. See 2 for why this is reasonable.
  2. Dependencies are not part of the public API (see semver spec FAQ), and therefore it is valid to update lib_a's dependancies in a minor or patch version. Even if we go against the spec, and define dependencies as part of the public API, how do we handle a library that does so? Use only the most recent list of dependencies? The first list of dependencies that came for a certain major version? Or check, when a new version is uploaded, that it uses the same dependencies, or force it to be a new major version?
  3. In a perfect world, yes, all libraries would contain only the smallest units, and the API would be so small that any breaking change would necessarily invalidate any library that depended on it. This is definitely not the case in the real world, some of the most popular libraries tend to be large, 'batteries included' libraries (not only in haxe. See Apache Commons, or Google Guava). I don't think lib_a@[3.2+] is enough, when the breaking changes from [email protected] don't bother my library. If we don't allow arbitrary ranges, we at least need a way to say lib_a for version 3 and version 4. (concrete example: the caching api in google guava hasn't changed from version 15.0.0 to 16.0.0)

I would be ok with removing the 'Or' semantic constraint, if you're looking to reduce complexity. I think we either need a way of specifying an arbitrary range, or a way to specify a range that ends more than 1 major version away. (maybe limit the upper bound to a partial containing only a major version: 3.2.1-alpha - 5)

Dr-Emann avatar Apr 09 '14 16:04 Dr-Emann

Commit 0cfc177 is related to this.

onefifth avatar Jan 04 '16 23:01 onefifth

I have no idea what to do with this one, and feel like closing it for the time being.

nadako avatar Mar 09 '16 19:03 nadako

While important, just checking if a semver is in a "semver set" is not enough, we need to be able to specify a set in haxelib.json, and something a lot smarter than right now to use (#290).

Otherwise something like this, we want to install something depending on libA and libB: libA depends on libC (>= 1.0.0 && < 1.1.0) || (>= 1.2.0 && < 1.3.0) libB depends on libC >= 1.0.0 && < 1.2.0) then libA will try to install the latest valid libC (by projecting on how that's currently done) say 1.2.5, libB will do the same and end up with libC 1.1.7, and our lib cannot be used (should have installed something like 1.0.1).

Alright this example was crated to fail and is very specific, it can still happen.

I think we need more design on this, and I do think it's an important feature, before implementation.

ibilon avatar Mar 12 '16 11:03 ibilon

This was originally written as purely an implementation giving haxelib the ability to compare semvers with the intent of extending it into the kind of dependency solving @ibilon describes above. When this PR was written, haxelib was using string comparison and did not have a way to even compare lib versions at all.

Lots has changed, there is now a basic implementation of semantic versioning in place and this PR is turning into much higher level discussion about how we want to solve dependencies rather than just defining and comparing.

If some minor work should be done to improve this solution, or at the least make this merge-able again, I would be happy to contribute if some decisions were made on exactly what needs to be accomplished.

onefifth avatar Mar 12 '16 16:03 onefifth

I think we shouldn't proceed with merging this, but continue design discussions (not now tho, but after 3.3 release). Too bad Github doesn't have an option to convert a PR to normal issue.

nadako avatar Mar 12 '16 16:03 nadako

hi am i late

Jotaro-Gaming avatar Dec 10 '21 20:12 Jotaro-Gaming

At this point I'll close this old PR.

Simn avatar Oct 22 '23 05:10 Simn