Juleps Pkg3: immutability of compatibility

Continuing half of the discussion on https://github.com/JuliaLang/Juleps/issues/3.

Nov 15 '16 17:11 StefanKarpinski

If we allow compatibility of versions to be mutated after the fact (as we do now in METADATA), one major issue is that it will be impossible, when compatibility has been modified later, to know what the state of compatibility constraints on versions actually were when versions were resolved. This could hide resolution bugs and generally makes understanding the system harder.

One possible solution is for each modification of compatibility constraints to increment a build number of a version or something like that, so 1.2.3 is the version with its original compatibility, while 1.2.3+1 would be a version with potentially modified compatibility or other metadata changes, which would get its own metadata in the registry, but share the same source tree.

At that point, however, I have to question why 1.2.3+1 wouldn't simply be called 1.2.4. The main objection seems to be that it's annoying / hard to create patches and package maintainers often aren't as responsive as we'd like. Which makes me think that we should just make it easier to make this kind of patch update and make it possible without the package maintainers involvement.

Nov 15 '16 17:11 StefanKarpinski

In particular, patches don't need to be made on the main repository of a project, they can be made on a fork as long as they are eventually upstreamed back to the main repo.

Nov 15 '16 17:11 StefanKarpinski

+1 UUIDs

Nov 15 '16 19:11 JeffreySarnoff

The reason for distinguishing a compatibility-only change from a patch change is that you may need to make the former long after the fact when there have already been later patch releases.

The version history of metadata currently would allow you to reconstruct the state of compatibility (assuming no local metadata modifications have been made), though which commits of metadata are used is not recorded long term.

Nov 15 '16 20:11 tkelman

The reason for distinguishing a compatibility-only change from a patch change is that you may need to make the former long after the fact when there have already been later patch releases.

If the latest patch release always supersedes previous ones in the the same major-minor series, then you can always just make a new patch. The only way needing 1.2.3+1 rather than 1.2.19 makes sense is if you want a version with compatibility fixes but without any bugfixes. That seems like a somewhat implausible situation. How would this be necessary? If such a situation did occur, we could always allow publishing 1.2.3+1 with updated compatibility but without bug fixes.

The version history of metadata currently would allow you to reconstruct the state of compatibility (assuming no local metadata modifications have been made), though which commits of metadata are used is not recorded long term.

That means we'd have to record the state of all registries in the environment, which ties the meaning of an environment to the history of registries in a way that we are (or at least I am) trying to avoid. If version compatiblity is immutable (in either 1.2.3+1 or 1.2.4 form), then you can always tell just by looking at the compatibility info for those version whether they are correct. You can't tell if they were optimal at the time, but you can verify correctness.

Nov 15 '16 21:11 StefanKarpinski

If the latest patch release always supersedes previous ones in the the same major-minor series

This is not a good idea, as I've said before - there's not a lot of precedent for allowing code changes to completely supercede old versions. If there's going to be a second class of dependency resolution for complete replacement, then it should not be allowing code changes. People break their api in bugfix releases even if we tell them not to, and downstream packages are going to need to be able to use api's that only existed in early patch releases. And this situation might not be noticed immediately, so there could be enough later patch and minor releases that there isn't room to fix the situation by making a new set of renumbered releases.

Nov 15 '16 21:11 tkelman

So are you ok with the idea of version metadata – especially compatibility – being immutable, but having 1.2.3+1 supercede 1.2.3 with no source code changes, only metadata changes?

Nov 15 '16 22:11 StefanKarpinski

Yes, that seems like a mostly equivalent way of accomplishing the same thing as modifying compatibility in metadata. It records more history permanently (not just in git history), maybe that could be useful though.

Nov 15 '16 22:11 tkelman

I do think we should keep a log of version history used by local registry copies over time, so you could feasibly implement an "undo" of a global update operation. That's a separate issue though.

Nov 15 '16 22:11 tkelman

Or are you entirely against the idea that version metadata be immutable?

Nov 15 '16 23:11 StefanKarpinski

Creating such a metadata-only update would be simplified if the metadata was only part of the registry, not the package itself, i.e. 1.2.3+1 could have the same hashes stored as 1.2.3. Actually, it would have to, to enforce the "no source code changes" policy. This would a) allow easy automatic verification of this policy and b) simplify metadata-only updates by non-package-maintainers.

Would that be an option? (Or is that already the idea and I misread the proposal?)

Nov 16 '16 08:11 martinholters

The example I gave in the other thread illustrates why patches are insufficient:

Pkg B v2.0.0 depends on v1.2 of Pkg A
Pkg C v3.0.0 depends on v1.2 of Pkg A
Pkg A v1.3.0 is tagged with new features
Pkg B v2.1.0 is tagged using features of Pkg A v1.3.0, but forgets to update the version requirement
Pkg B v2.1.1 is tagged fixing this.

Now user installs Pkg B and Pkg C: the end result would be:

Pkg A v1.2.x (as this is the latest version compatible with Pkg C)
Pkg B v2.1.0 (as this is the latest version compatible with Pkg A v1.2)
Pkg C v3.0.0

which would be broken.

Nov 16 '16 14:11 simonbyrne

@martinholters: Yes, having compatibility info not live in the package repo is definitely a possibility, but it would make it harder for unregistered packages to participate in version resolution. Since making unregistered packages easier to work with was one of the major requests for Pkg3, that's a bit of a problem. Also, if we move compatibility info out of the package itself, where does the developer edit it? The obvious answer is in the registry but I feel like that's not tremendously obvious or developer-friendly.

@simonbyrne: This wouldn't be the result under what I've proposed since the existence of Pkg B v2.1.1 would prevent resolution from ever choosing Pkg B v2.1.0 – that's what "strongly favor the latest patch release" is meant to convey. Instead you would get A v1.2.x, B v2.0.0 and C v3.0.0. In the other approach being discussed here, B v2.1.0+1 would fix B v2.1.0's dependencies and would similarly hide B v2.1.0 from consideration when resolving new versions.

Nov 16 '16 17:11 StefanKarpinski

The core of @tkelman's objection (assuming he's not against the idea of immutable version metadata entirely, which would be good to get an answer on), seems to be that updating version metadata via new patches allows metadata fixes to be mixed with bug fixes – well, technically arbitrary source code changes, since people may not just fix bugs in patch versions. But if people stick with bug fixes in patches, this won't be a problem: why would you want a buggier version? Yes, people will screw up bug fixes, but then the appropriate action is to make another patch that fixes the fix.

Fixing version metadata for 1.2.3 by releasing 1.2.4 is less flexible that adding another level of metadata-changes-only versioning like 1.2.3+1. So why not just add another layer and semantically separate metadata changes from code changes of any kind? One reason is that semantic versioning already has three layers of versioning, which is already a lot to deal with and reason about, and adding another one seems complicated and unnecessary. At the level of practical development, people only use branches corresponding to major/minor versions: patches occur on branches with names like release-1.2 – if you want to make a new 1.2.x release, you tag the tip of release-1.2. How would this workflow change with metadata-only changes like 1.2.3+1? You need a branch for each patch release now: you'd make metadata-only fixes on release-1.2.3 and you'd need a branch like that for every single release. That just seems ridiculous. If you make metadata fixes via new patch releases, mixed in with other bug fixes, then the current workflow doesn't change at all – just fix version metadata on the release-1.2 branch and tag a new patch.

My perspective is that we want to design the package manager so that making patch versions that do anything besides fixing bugs is problematic. This will actively encourage package developers to only fix bugs in patches. Two feature of the proposed design that encourage this are:

Have newer patches fully supercede older ones with the same major/minor version.
Not allowing version dependencies to specify versions at patch granularity.

Both of these design choices assume that patches with the same major/minor version are equivalent aside from metadata updates and bug fixes. If a package maintainer violates this assumption by adding or removing functionality in a patch, it will cause problems. Problems lead to complaints, which will provide feedback to the maintainer and help them learn that this is bad practice and not do it in the future. This is not based on some sort of groundless optimism that people will do things correctly on their own, it's based on the principle that people respond to feedback and that we can design a system that actively causes people to receive corrective feedback. Is this limiting the ways that package developers can version their packages and have things work smoothly? Yes, but I think that's a good thing.

Nov 16 '16 17:11 StefanKarpinski

If a compatibility-only change can be done only at the registry level without needing the source to change at all, then there's no need for a branch for a compatibility revision.

Designing the system to be intentionally rigid and inherently flawed in the face of a behavior that people will commonly do (a recent example, changing the type of a single parameter of a single function - that breaks the api but seems like a minor change), and in a way that cannot be easily fixed once newer versions have been published, is why I think this goal is a bad idea.

The core job of a package manager is if source has been published as a release version, it should be possible to depend on it. Demoting the patch level of versioning from this is unnecessary, adds friction to the system, and doesn't gain us anything. Downstream users are the ones who face problems from versioning mistakes, and are incapable of fixing them or working around them without cooperation from the upstream author, or forking the package and re-releasing a new series of different version numbers. We don't gain enough for this to be worth it.

Nov 16 '16 20:11 tkelman

What qualifies as a bugfix is not always clear cut either. In fixing one bug, you can often accidentally (or intentionally!) break something else that downstream users were depending on. And these issues don't get identified immediately. By the time some of these issues are found, the upstream author may have moved on to a newer release series, that the downstream users don't have time to upgrade to right away (especially if there was a past release that worked fine for them). What option does downstream have to get their code working again? They could publish a fork without any of the more recent releases, but why have we made them go to that trouble when a patch level upper bound would serve the exact same purpose?

Nov 16 '16 20:11 tkelman

The problem with having registry-only compatibility changes is that it:

makes compatibility confusing since there are multiple conflicting – and changing – sources of what a version's compatibility actually is, and it
makes registered and unregistered packages work completely differently – registered packages have a mechanism for amending compatibility while unregistered ones don't.

The process I'm proposing is straightforward and the same for registered or unregistered packages: keep definitive compatibility info in Config.toml; when compatibility needs to be adjusted, just edit Config.toml on the appropriate release branch, commit the changes and publish the tip of the release branch as a new patch.

Preferring the latest patch for version resolution doesn't make it impossible to use older patches, nor does it force users to upgrade to the latest patch – if what they're using works, no problem:

if you're already using v2.1.0 and it works, no problem
if an environment records v2.1.0 and you run it, you get v2.1.0
if you install or upgrade, then yes, you’ll always get v2.1.1 instead of v2.1.0
but you can still explicitly ask for v2.1.0, e.g. with pkg> add A = 2.1.0

The example you allude to (where was this?) with a changed type parameter is a simple broken patch. The correct fix in such a situation if you depend on the package to exclude that specific broken patch, which solves the problem; if you're the package maintainer, the fix is to revert the part of the change that broke compatibility for someone and make a new patch release. Neither is a big problem.

I would love an actual problematic case that can't be handled with what I'm proposing instead of general arguments about what package managers should or shouldn't do. If there's some problem scenario, I want to know about it. The kind of example @simonbyrne presented is exactly what I'm talking about (hopefully my answer to that is convincing to him). The Compat example in #3, is also exactly what I'm talking about: the fact that minor updates to packages with many dependents (Compat being the most extreme example) would force patching of all dependents is a devastating problem with my original proposal, hence https://github.com/JuliaLang/Juleps/issues/15#issuecomment-261025316.

Nov 16 '16 22:11 StefanKarpinski

The problem is the "broken patch" is broken from the perspective of downstream users who were using the old api, but intended as a new api by the upstream author. Upstream isn't going to revert it. Downstream then needs to indicate that all future patches are broken. That's not possible in this proposal, every new upstream release would break the downstream until downstream gets a chance to add another broken patch to their list.

It's not possible for compatibility to be set in stone and never change - compatibility depends on the entire set of possible interacting versions of dependencies, it always changes as new versions get released.

Nov 16 '16 23:11 tkelman

You are proposing making it impossible to declare version compatibility bounds at patch granularity. That's necessary in the case above, where

package B depends on package A, which is at say v 1.3.3 when package B gets written (and it relies on a feature that was new in 1.3.0) package A breaks api between versions 1.3.5 and 1.3.6 package A makes many more 1.3.x releases, several 1.4.y, and has started on 2.0.0 package B gets a report that it doesn't work any more with package A v1.4.3

Assuming the author of package B can remember or recover from environment info what version of package A did work, there's no way in this proposal of reflecting its requirements since it can't express an upper bound on A v 1.3.6 that caused the problem. It could say every patch from 1.3.6 on is broken, but if those have to be listed individually then it becomes incorrect as soon as an additional 1.3.17 backport gets released. The most practical solution to immediately get a working version of its dependency is to republish a fork of the old version of package A.

What problem is solved by disallowing requirements at patch granularity, and disallowing expressing requirements as ranges?

Nov 16 '16 23:11 tkelman

The subject of this issue is immutability of compatibility, which is orthogonal to patch granularity. I was trying to unmuddy the discussion by splitting #3 in to this issue and #15, which would be a better place to discuss patch granularity, although that's explicitly about the opposite complaint: that the granularity is too fine, which I already conceded.

Nov 16 '16 23:11 StefanKarpinski

Splitting a discussion without posting to that effect in the discussion itself isn't terribly effective.

Compatibility constraints are either correct, too tight, or too loose with respect to the time and set of available dependency versions when you state them. As new versions become available, a previously correct set of constraints can become too tight if it doesn't include working versions, too loose if it does not indicate new breakage, or remain correct. Compatibility claims that were too tight or too loose when they were first made may need to be amended after the fact.

If making personal registries is simple, then I don't think it's worth worrying about how to amend compatibility for unregistered packages. Source releases should be immutable, compatibility often needs to be amended, so compatibility should be tracked outside of the source. If you need to amend compatibility for an unregistered package, then create a personal registry to track it.

Nov 17 '16 00:11 tkelman

I really hope that package management and compatibility can be managed outside of the actual codebase as much as possible. In fact I wish that we didn't use git tags at all. Forcing package authors to add new commits (and tag them) just to fix a dependency resolution is ridiculous. Please lets put all requirements outside of the actual package repo. Let a core group of people manage those dependencies for the curated metadata, with advice from authors. Private metadatas will be easier to manage as well.

On Wednesday, November 16, 2016, Tony Kelman [email protected] wrote:

Splitting a discussion without posting to that effect in the discussion itself isn't terribly effective.

Compatibility constraints are either correct, too tight, or too loose with respect to the time and set of available dependency versions when you state them. As new versions become available, a previously correct set of constraints can become too tight if it doesn't include working versions, too loose if it does not indicate new breakage, or remain correct. Compatibility claims that were too tight or too loose when they were first made may need to be amended after the fact.

If making personal registries is simple, then I don't think it's worth worrying about how to amend compatibility for unregistered packages. Source releases should be immutable, compatibility often needs to be amended, so compatibility should be tracked outside of the source. If you need to amend compatibility for an unregistered package, then create a personal registry to track it.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/JuliaLang/Juleps/issues/14#issuecomment-261115524, or mute the thread https://github.com/notifications/unsubscribe-auth/AA492n9oSA1HZIK2ZVNgjFtcMlQUuF8oks5q-51RgaJpZM4Kyxf_ .

Nov 17 '16 01:11 tbreloff

+1.618 for allowing me to become unconcerned with anything git related

Nov 17 '16 02:11 JeffreySarnoff

@tbreloff package authors need to be responsible for dependency versioning. What features are you using, when things break how do you fix or work around them, etc. That comes with the territory of having dependencies. If you get any help you're lucky, but you can't expect other people to do this for you.

An outside-of-the-source copy of the dependency information may need to take priority here though, as in the existing system where metadata is used for registered packages, the package's copy of REQUIRE isn't actually used except at tag time to populate the initial content.

A compatibility-only revision release could be a mechanism for this, but it needs to be possible to do that for any published release, not just the latest within a minor series. Compatibility is about the rest of the world with respect to a fixed version of a package - we shouldn't be mixing the release numbering or resolution mechanism for outside-world compatibility within the same system (and constraints) that we use for a package's own source.

Nov 17 '16 06:11 tkelman

So then maybe what I'd like is a little more subtle. It would be nice if the larger community had a mechanism to tag and fix dependecies in place of authors that don't have the time or knowledge to keep up with the process. How many times a day do you have to tell people exactly what they need to do and how to do it in order to properly register or tag? Wouldn't it be easier for everyone involved if you just did it yourself? You're the one with commit access to metadata, so why go through the silly and pointless steps that make it seem like the author has anything valuable to add? I'd be happy with v1.2+ and v1.2.3+ if it means problems are immediately solved by the people who understand the right way to solve them.

tl;dr Manage as much as possible from within metadata(s) without necessarily requiring the author

On Thursday, November 17, 2016, Tony Kelman [email protected] wrote:

@tbreloff https://github.com/tbreloff package authors need to be responsible for dependency versioning. What features are you using, when things break how do you fix or work around them, etc. That comes with the territory of having dependencies. If you get any help you're lucky, but you can't expect other people to do this for you.

An outside-of-the-source copy of the dependency information may need to take priority here though, as in the existing system where metadata is used for registered packages, the package's copy of REQUIRE isn't actually used except at tag time to populate the initial content.

A compatibility-only revision release could be a mechanism for this, but it needs to be possible to do that for any published release, not just the latest within a minor series. Compatibility is about the rest of the world with respect to a fixed version of a package - we shouldn't be mixing the release numbering or resolution mechanism for outside-world compatibility within the same system (and constraints) that we use for a package's own source.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/JuliaLang/Juleps/issues/14#issuecomment-261166373, or mute the thread https://github.com/notifications/unsubscribe-auth/AA492h_v_zsbEfvzKw1j5Q2UwR_c06Fgks5q-OagaJpZM4Kyxf .

Nov 17 '16 12:11 tbreloff

The notion that you can build a functioning ecosystem of reusable software without authors thinking about versioning at all strikes me as incredibly implausible, not to mention totally unscalable. Who's going to be spending all of their time figuring out how to version every single registered package? Your answer here seems to be "I dunno, but not me." If you want to develop software that way, that's cool – then don't register your packages. What I'm proposing will support unregistered packages much better, but it won't change the fact that following along with whatever happens to be on master on a set of packages will not be a good way to build systems that don't break all the time.

Nov 17 '16 13:11 StefanKarpinski

without authors thinking about versioning at all

Of course there's a middle ground. Authors think about the high level versioning, but not necessarily the gritty details (that frequently are due to other packages out of their control). Those details should either be handled by automation or by expert guidance, depending on the situation.

Your answer here seems to be "I dunno, but not me."

When it comes to curated metadata repos, if I'm not a curator then the final responsibility is not mine. Package authors can guide versioning (and should be encouraged to do as much as possible themselves) but this mentality that curators should never make changes to the thing they're curating, but instead to enact social pressure on package authors until they make the exact change that the curator could have done in the first place... it's just stupid. I want to see the curation as disjoint from the code.

following along with whatever happens to be on master on a set of packages will not be a good way to build systems that don't break all the time.

I couldn't agree more, which is why I care so much about making it dirt-simple to "do the right thing".

Nov 17 '16 14:11 tbreloff

@StefanKarpinski @tbreloff. Each of you is right, In important measure

I have seen the need for handholding in the less well traveled regions of the deep end of the pool. increases superlinearly. @tkelman The work you do helping us deal with tags and git when it goes on a bender probably is more informative than predictive.

This Summer and next Fall I expect for Julia a flood of new and very active involvement. Something is going feel the extra weight. :walking_man: (mmph, :cry:) "I do not want to play with git" (:cry:, mmph)

between update and upgrade. ?uplift

Nov 17 '16 15:11 JeffreySarnoff

Perhaps it would be useful to gather some data.

Do we have examples where the versions were tagged incorrectly, or other "broken" version resolution cases. How were these resolved?
- one that I can think was the problem with Distributions/StatsFuns.
How do other package managers handle these problems?

Nov 17 '16 15:11 simonbyrne

@tkelman Do you recall any of my chained missteps?

Nov 17 '16 15:11 JeffreySarnoff