ecosystem-proposals
ecosystem-proposals copied to clipboard
Uncurated Hackage Layer
There is a tension between two purposes of Hackage -- first as a central repository of Haskell code, and second as a curated store that has artifacts that are intended to be correctly built and depended upon in a self-contained fashion. The aim of this proposal is to separate these two purposes, by allowing authors to distinguish if they wish to opt-out of following the PVP and the attendant curation process that helps to maintain correct dependency information.
Rendered proposal: https://github.com/gbaz/ecosystem-proposals/blob/gbaz-uncurated/proposals/0000-uncurated-layer.rst
Thank you @gbaz for writing this up (so quickly)! I am however not supportive in it's current form. I believe it will ultimately lead to fewer package on hackage and many more on the uncurated hackage.
I think there's also a problem understanding what hackage is. hackage has always been a curated package repository, where trustees curate packages.
Curated packages cannot depend on uncurated packages, and the hackage server will detect this as an error at upload time.
This seems like a good way to end up with lots of packages that end up in the uncurated package index, even though they maintainer might be sympathetic to the PVP but can't be bothered to start talking two everyone in his dependency chain to please follow the PVP...
Uncurated packages may be "adopted" into the curated ecosystem by trustees. Metadata revisions necessarily remove the x-uncurated property from the revised cabal metadata.
... or try to pressure trustees to do the work for him.
Curated package uploads will be checked on upload to ensure they don't have dependencies on uncurated packages. Further, the curated index will only provide information on curated packages.
So we are eventually going to force PVP, and exclude non PVP from visibility? Would cabal-install still see both indices or only the curated one? If it's the latter, it would be a loss as opposed to what we have right now.
As such I believe this will rather lead to more fragmentation and a split then what the intention is.
As I've mentioned on the SLURP proposal, I'm in favor of providing raw-hackage
(which you seem to want as well (my raw-hackage = your hackage/uncurated -- the immutable package store hackage already has, with an empty append only revisions index). I'd also prefer it to be at raw.hackage.haskell.org
, such that all I would need to do is change the domain to switch the hackage.
On top of that I'd like to see hackage become an overlay (where improvements to the overlay logic may be needed). Hackage right now as I understand is the immutable package store with an append only revisions index. What I'm suggesting is to turn the append only revisions index just into an overlay.
As I understand we want the PVP so we can have tooling like matrix.hho and other tooling that might come down the line. My suggestion would be that we do not enforce this at the package index level, but at the tooling level. If you try to use a package that doesn't follow the PVP with the tools and want to reap the benefits of the tooling, the tooling will politely inform you that it can not handle the package without PVP.
I'd also like to see a more open policy where the community could provide PVP PRs against the overlay if they want a certain (package, version) to follow the PVP. (Of course not every package can be made to follow the PVP, but some might). Maybe the maintainer doesn't see the need to follow a strict PVP, but someone else might and would want to provide the necessary change, so that he can use the tooling that relies on the PVP. If all that's needed is a PR against the hackage overlay, that seems like a rather low barrier to contribution. If packages have their repository / maintainer contacts set, the author could be pinged about this as well.
At this point the proposed binary flag you suggest could come into play and work as a white list of who would like to be notified about PRs agains the overlay?
As I've mentioned on the SLURP proposal, I'm in favor of providing raw-hackage (which you seem to want as well (my raw-hackage = your hackage/uncurated -- the immutable package store hackage already has, with an empty append only revisions index). I'd also prefer it to be at raw.hackage.haskell.org, such that all I would need to do is change the domain to switch the hackage.
What is the difference between the two ideas then, outside of the domain name? (I'm fine with bikeshedding how the uncurated repo is served any which way).
It sounds like the big difference is @angerman is suggesting that the curated layer should contain every package in the raw layer, it would just also provide revisions for some of them. Whereas this proposal suggests the curated layer would have a subset of the packages in the raw layer (and not permit dependency links into the raw layer).
@cumber correct!
I’d also want to move the PVP enforcement into the tools that take advantage of PVP and not into hackage. Hackage and cabal-install seem to work with the current upload policy? Am I missing something?
I’d also want to make community participation in hackage package revisions easier by turning hackage into an overlay over hackage-raw/uncurated, that can be driven from a git repository.
I think you're coming at this from the start with overlays in mind, since you've been working on one, and see the benefits. But this proposal starts with a different idea in mind -- collections. Just as stackage is a collection, curated hackage is a collection. It happens the "base material" from which curated hackage will be built is the total of packages on hackage today. But as things evolve, there will be more things in the "uncurated" layer than in the curated layer. As a collection, the things in the curated layer will need the property that collections typically have -- that they are closed under dependencies.
The specific problem with the other approach is that if you have a dependency link into a package that itself does not specify revision bounds, then you transitively also do not specify revision bounds on the dependencies of that package. So that does not suffice.For a curated system to work, it needs to assume that the pvp is followed throughout the ecosystem -- otherwise things fall apart.
An alternate proposal would be to require that curated packages can only depend on uncurated packages if they themselves specify all transitive dependencies they inherit -- but that sounds nightmarish to maintain. Further you lose the generally desirable property of transitive closure.
As to why curated should not be just a "revisions" overlay -- the idea is that people should be able to depend on an index that contains packages that are known to be installable (this is the purpose of curation). That's what the curated index provides. One can of course also use the curated index as an overlay on uncurated. So all the combinations are possible: 1) everything, with no revisions, 2) everything, with revisions, 3) only curated things. (Again, for the last to work, this requires that curated does not depend on uncurated).
The question then becomes, which of the three setups should ship by default with cabal (in the default config file). This proposal does not specify this. My feeling would be that the curated index should be the default, since it consists of things that are and will continue to be cabal-installable. But this is certainly a question for discussion -- and one in fact that need not be resolved for this proposal to be implemented.
You also write "This seems like a good way to end up with lots of packages that end up in the uncurated package index, even though they maintainer might be sympathetic to the PVP but can't be bothered to start talking to everyone in his dependency chain to please follow the PVP..."
In the abstract, this is a good concern. In practice, I think that this will be less of one. In my experience, and based on conversations with trustees, most things actually try to follow the pvp, or otherwise stay pretty green. Further, while we have a lot of packages on hackage, there are a relatively small amount that are commonly in dependency chains.
By back of the envelope accounting, of roughly 12,000 packages on hackage, only 1/3 have any rev-deps, only 1/6 have more than one, and 5% (650) have more than 10.
So that seems like a manageable amount to worry about :-)
Edit: oh, and one more thought -- if indeed there ends up being a logjam of requests for adoption of uncurated packages into curated, we can always either add more trustees or move adoption rights to a group beyond just trustees. I would also hope in the future that tooling can be produced (perhaps in conjunction with the matrix builder) that can render adoption mostly automatic.
Thanks for typing this up! I don’t currently have time to give it a full review, but I am broadly in favor of it. The only part I strongly disagree with is (eventually) hiding uncurated packages from the UI entirely. I have no problem with distinguishing curated packages, but I feel that uncurated packages should always be visible.
To be clear, the idea (and this certainly is up for discussion) is that there would be a filter feature that does not now exist, which lets certain things (e.g. deprecated packages, packages that are executables) be filtered out. This would be some checkboxes in an in-page javascript thing, probably tied to the functionality that now exists in datatables. The default setting for these would be to filter the uncurated collection, but users could change this.
Also, in such filtering interfaces we would certainly want to indicate how many results are filtered away and hidden, so that users could expand to see all the missing things with a single click.
The general thought is people search hackage for packages that are cabal-installable. Curation is what indicates that. If people want to find packages usable from a given stackage lts, they tend to look at stackage. So making curated packages the ones that are most easily discoverable suits the needs of hackage users.
If we manage to move into a situation where there are multiple collections on hackage (of which curated is just one), then we'd need to revisit this. E.g. if stackage lts releases were also presented as searchable collections, etc. then we'd definitely want to have a different approach.
@gbaz you are correct in that I see this through my experience with overlays.
I think you're coming at this from the start with overlays in mind, since you've been working on one, and see the benefits. But this proposal starts with a different idea in mind -- collections. Just as stackage is a collection, curated hackage is a collection. It happens the "base material" from which curated hackage will be built is the total of packages on hackage today. But as things evolve, there will be more things in the "uncurated" layer than in the curated layer. As a collection, the things in the curated layer will need the property that collections typically have -- that they are closed under dependencies.
My basic question here is why can't we improve overlays (allow filtering), such that we can use the same approach to arrive at collections as well?
As such, I would rather like to see myself as wondering if we can't reach what this proposal supports by slightly adjusting the overlay logic we already have and as such provide a uniform mechanism that can be used for a variety of use cases we could come up with?
Under this assumption even stackage could be represented as an overlay over hackage.
Edit: The basic question I have is: assuming we have some hypothetical overlay solution that may be slightly different / improved over the current overlay solution we have, can't we simply use that as a the implementation for this proposal and do not need any custom modifications to hackage just for this proposal, while providing a basic building block that could be used for other purposes as well?
@gbaz the current curated hackage (with revisions) does work as it is, or am I missing something?
As to why curated should not be just a "revisions" overlay -- the idea is that people should be able to depend on an index that contains packages that are known to be installable (this is the purpose of curation). That's what the curated index provides.
I must be missing something from this proposal then. The way I understand it is to add an additional metadata flag, and a tightening of PVP requirements. Yet, this is a departure from the status quo, which seems to work (for me only?).
My feeling is that overlays and collections are two almost opposite ideas. Overlays allow monotone addition of data. Collections are "coherent subsets" of data. The union of two closed sets is itself closed. A subset of a closed set may be open and need completion under a closure operation. I don't see how overlays can produce collections, since overlays are really about unions. In fact, I could imagine that under an imaginary new collections setup, overlays could also provide overlay modifications to collections. Anyway, we're far afield here.
More directly, if I am to understand your second question -- you are asking if curated hackage with revisions works as is, and so why we need to depart from it?
The answer, I think, is it works mainly, for many of us, but not for some people. In particular, on the one side, trustees get frustrated that packages choose not to adhere to versioning standards, leading to a lot of breakage. On the other side, authors can get frustrated when they just want to toss up something as e.g. a research project or the like, or for whatever reason just don't want to follow versioning guidelines, but they find themselves getting requests to change their versioning.
This is because of the dual role Hackage plays, which I alluded to above. You wrote earlier "hackage has always been a curated package repository, where trustees curate packages." Well, yes and no. Hackage hasn't always had trustees, or curation, or revisions. It has had a variety of changes over the years. But it has always been the place to put released Haskell packages. We want it to be able to continue to play that role for everyone, while also providing important features relating to the health of a package ecosystem, which is a consideration (that we would have an ecosystem so large and complex that we needed to maintain its health!) that was hard to imagine back when getting 100 packages onto it was considered a major milestone.
By letting people who want to use hackage as simply a place to release haskell packages do so without fuss, and also establishing a curated layer which is managed as an ecosystem we ideally make everyone happier. Trustees need not worry about packages which opt-out -- they can adopt them, or they can leave them be, but there doesn't need to be a weird middle ground. Authors who don't want to worry about the ecosystem can decide not to. Then someone else can decide to make it their problem (through offering co-maintainership ideally, for the purpose of creating revisions) or not.
At the moment, on hackage, we defacto have packages that are curated and those that effectively have opted-out either by direct request to trustees, or by just doing their own thing but not being on anyone's "radar" by not being a noticeable part of the revdeps graph. But this is all somewhat confused and muddled, and when a package that tries to do the right thing depends on one that does not (and here I know of examples, but don't want to list them, because I don't want to foster animosity -- i want to remove it) then that causes a bunch of work for the trustees.
In a sense this proposal is to make more formal and clear to everyone (and enforce technically) a lot of what has already been semi-worked-out informally, and to let people signal their existing practices more effectively to prevent miscommunication.
+1 @angerman's idea. I think we really need to do the thing that just gets out of people's way. Having to opt in or out, and being told you can't do things because of another person's opting (which, maybe, can change over time), is going to lead to frustration eventually. For example, I would love to follow the PVP, but there are a lot of libraries I like to depend on that don't. Being in a sort of half-way state, where I can do my best but ultimately have a major failure point is much better than not letting me do my best. I think there's a good portion of libraries I would like to write that would end up uncurated even though I'd like to be using PVP. Also, as I alluded to earlier, what if one of my dependencies opts out of the PVP in a later version? I suppose this doesn't need to be possible, but it is another wrench in the plan, as I think some people don't want to be locked into the PVP forever. All in all, "uncurated" will behave like a virus, and eventually everyone will be forced to make all new packages uncurated, defeating the purpose of the proposal.
Having an uncurated layer, with an overlay modeling the current model of unilateral curation by trustees, is going to make it much easier for users to stay out of each others' way. It will likely be harder on trustees, which I believe may be one of the prevailing concerns against this idea. But frankly, I think making things harder on the trustees is better than making things harder on all users.
This has to have a good user experience. Throwing multiple new wrenches into a beginner's face as proposed is going to go very poorly: 1) Now when they upload their first package, they have to figure out why one would opt in or out. 2) They have to deal with Hackage rejecting dependency graphs that the users doesn't understand aren't fully PVP compliant; thus leading to newcomers choosing to opt out of PVP by default just to shut Hackage up. I think that as is, this proposal somewhat violates the principle of least astonishment, which seems like a pretty critical principle in the current environment.
That said, I would still prefer this proposal as is over doing nothing. So +1 from me.
Uncurated packages may be "adopted" into the curated ecosystem by trustees. Metadata revisions necessarily remove the x-uncurated property from the revised cabal metadata.
I don't understand how "x-curated" is considered an opt out of trustee revisions, if trustees have the power to "adopt" any uncurated packages.
Hackage will provide two package repository roots -- http://hackage.haskell.org and http://hackage.haskell.org/uncurated These roots will provide index-01.tar.gz files that contain the information, respectively, for curated packages, or for all packages. The uncurated root will contain no revision information.
I'm not sure I understand this correctly, either. Doesn't the hackage.haskell.org root already include all "unrevised" information in addition to revision information? The uncurated root therefore provides strictly less information. What's the point? Can't tooling just access whatever info it wants?
I understand the stated motivation in theory, but don't see what meaningful change this proposal accomplishes in practice. I like the general idea but something is not quite clicking for me.
I don't understand how "x-uncurated" is considered an opt out of trustee revisions, if trustees have the power to "adopt" any uncurated packages.
Thanks to this question, I just realized that I think I got something wrong in the proposal. The reason it opts-out of revisions is that packages in the uncurated index do not include revisions. However, the implication in the proposal is that all packages in the uncurated index do not include revisions. This is wrong, I think. Rather, the uncurated index should be [curated with revisions + uncurated without revisions]. And the curated index should be [curated with revisions + adopted uncurated with revisions].
That should make clear the difference.
I'm not sure I understand this correctly, either. Doesn't the hackage.haskell.org root already include all "unrevised" information in addition to revision information? The uncurated root therefore provides strictly less information. What's the point? Can't tooling just access whatever info it wants?
This is also a pretty good point. Arguably we could just make all the information available and leave it to tooling to interpret it as it sees fit. However, tooling that exists currently can't do that. By providing two index files we make the choice available without any downstream changes.
Again, long term, with mythical collection support, we wouldn't want to end up with an index-per-collection. But in the meantime, this lets us get to the goal in a more modular way.
I also think @ElvishJerricco has two good concerns, first about if "uncurated" will have a sort of inevitable gravitational pull (the tendency of the universe towards chaos, entropy consuming all things, etc.). I was hoping adoption would stem that. But I would really like current hackage trustees themselves to weigh in, since they have the best feel for the dynamics of the ecosystem as it stands.
The second is about new user experience, principle of least astonishment, etc. My take at the moment is the following -- the choice of flag isn't forced on uploaders. Rather, it is false by default. If the transitive check fails (and ideally it does not), then there is a message explaining this, with a link to a good explanation, and suggesting they seek the packages that caused the check to fail to be adopted.
In general, recall that transitivity-enforcement is step five of a five part plan. It might be worthwhile to include something in the proposal that before conducting this step, we need to reassess the state of things to be confident it can be done relatively painlessly, and otherwise seek other mitigating measures or reassess. For example, we could render it a warning first, and keep track of how many times and in what cases the server issues the warning. That way we know if it is an issue in practice or not.
That said, I realized I can motivate the need for transitivity and two indexes by the following chain of implications.
First: We have to start curated with the existing full package set. Second: Therefore we need curation to be a per-version flag so packages can migrate out. Third: Therefore the solver can well behave differently in an index that contains both curated and uncurated versions, compared to one that only contains curated versions, even if there are some versions of all dependencies in both. Fourth: Therefore we need to provide a distinct curated index, and additionally the index must be transitively closed.
My feeling is that overlays and collections are two almost opposite ideas. Overlays allow monotone addition of data. Collections are "coherent subsets" of data. The union of two closed sets is itself closed. A subset of a closed set may be open and need completion under a closure operation. I don't see how overlays can produce collections, since overlays are really about unions. In fact, I could imagine that under an imaginary new collections setup, overlays could also provide overlay modifications to collections. Anyway, we're far afield here.
@gbaz thank you for clearing this up! Your and my understanding/interpretation of overlay seem to diverge a little. To me an overlay can be restrictive on the set it overlays (e.g. via a predicate) and does not necessarily need to be a union. Maybe we just need a different word? From my point of view an improved overlay mechanism could in fact yield a subset based on a predicate, while at the same time provide augmentation of the underlying set (for example to satisfy the predicate).
I'm sorry if this confusion of terminology has lead to unnecessary misunderstandings!
Again my intention was only to push for a mechanism that would let us model collections and patches ontop of hackage in a unified manner that would be generic and allow for the same tooling to be use for all use cases.
Again the benefits I see are (given raw hackage):
- model hackage (as is)
- model hackage (as proposed in here)
- model stackage lts (as is)
- model mobilehaskell and head overlays
- potentially model the composition of stackage lts and mobile haskell or head (there is no clean composition here as patches just don't compose well, but this might be of little concern if in practice if patches are orthogonal and merging of two "overlays" would result in no merge conflicts)
- model a blessed set of package (I can only use packages that are MIT or BSD licensed, including transitive dependencies due to company policy, ...)
- Eta's Etlas (I believe).
- model something we haven't though of yet, that fits in the generic framework of augment (patch) and restrict (predicate).
Thanks @gbaz, I think this looks promising.
I'm reading this while I'm in a bit of a hurry so please excuse me if I missed something here.
I'm a bit confused on what x-uncurated
means. Is it that a release is currently only part of the uncurated index but can be revised by a trustee to be curated, or is it an opt out for maintainers to say that a package will never be apart of the curated index?
As @ElvishJerricco wondered, will everything gravitate towards being uncurated? I think that depends on the answer to my question above. If any version of a package that is uploaded with curation disabled can be revised for inclusion to the curated index I do not expect this to be an issue since trustees will be able to include it. When there is a new release of that package the curated index will not contain it by default so the process must be repeated. The curated index would only contain older versions for a while. This is similar to how stackage nightly works, at times it may be months before the latest release of a package makes it into nightly. This may cause other packages that require the newest release to also be left out of nightly. This causes issues sometimes, but rarely, eventually stackage nightly catches up to the latest versions. I speculate that it would be a similarily sized issue for the curated index.
I do expect that this proposal would lead to more work for trustees, but there are at least two solutions to the issue that we can do today. 1. Onboard more trustees, 2. Package maintainers can add co-maintainers that perform revisions. Maintainers may be reluctant to give someone maintainer status for just revisions, if this is the case I suggest that we add a "revisionist" permission level to individual packages so that a user can make revisions to a package but not upload new versions.
I need to head out for today, hopefully I'll have some time to re-read this during the weekend.
@bergmark
- Onboard more trustees, 2. Package maintainers can add co-maintainers that perform revisions.
Indeed, that's the idea to keep it simple and to incentivise maintainers teaming up with co-maintainers of their choice as curation inevitably requires some level (even if very minor) of cooperation & communication with a maintainer. I'd like to see how well this works out before considering any more complicated mechanisms (like the "revisionist" bit") in anticipation of something that isn't known to be a problem yet.
With the rick of repeating something already know, some other package managers[1] is using the concept of a lock file to solve some parts of these issues.
The idea is that you can depend on uncurated packages by pining their revision thus ensuring a green build across machines. While mostly used for end consumers I believe it can be tweaked to by used by packages uploaded to hackages
An alternate proposal would be to require that curated packages can only depend on uncurated packages if they themselves specify all transitive dependencies they inherit -- but that sounds nightmarish to maintain. Further you lose the generally desirable property of transitive closure.
This is very true, but by using a lock file you can give the solver a green build plan. The only caveat is the lose of transitive closure when you have to multiple curated packages depend on different versions for an uncurated package.
I guess this is where manual intervention by trustees is needed, but it might allow more packages to have curated status even when depending on uncurated ones. It should also help to triage those uncurated packages that need blessing and/or some work to adhere to the PVP.
Edit: I forgot elms package manager that actually tries to enforce semvar, this also seems doable in hackage. Well at least the check, which should help with loosing of version bounds inside those hypothetical lock files for uncuracted packages.
[1] Rubys bundler, nodes yarn and npm, rusts cargo and pythons pip. Pip is by convention by specifying exact requirements instead of ranges.
If I understand the concept correctly, cabal-install
already supports lock files (see cabal [new-]freeze
), and stack is basically built around this idea.
I like this proposal in general, and I really like per version property.
There is current "non-solution" of uploading boundless version, and making a revision to add bounds. It's a non-solution because current (which is curated) index is populated with boundless version for no reason.
This proposal makes that more robust:
- upload boundless with
x-curated: False
- make a revision adding bounds and removing
x-curated
field
I could use this approch myself when there are two different indices: I probably will upload boundless version so I don't need to do revisions for test or benchmark dependencies on Stackage. (I find that very annoying).
Curated packages cannot depend on uncurated packages, and the hackage server will detect this as an error at upload time.
I don't think this check is a hard requirement for this proposal to go forward.
Rather the coummnity should write documentation how to use different indices, and improve tooling where necessary:
-
cabal
does support multiple indices, but switching them is inconvinient. - AFAIK
stack
doesn't support index change atm.
As there aren't uncurated packages in the uncurated index, then the check could be as simple as: there should be install plan for a new upload.
FWIW, this can (should?) be done even now. Or to put differently: if we don't want to do it for current Hackage, we shouldn't in curated/uncurated setting.
If the check is done without using a solver, i.e. "for each dependency there is some curated version", I'm afraid it won't work as intended in the long run.
If the check will be implemented, it should only check library and executable's depndencies. Tests and benchmarks don't need to be there. For example, it's common to benchmark against similar packages, which may be uncurated.
I don't understand how "x-uncurated" is considered an opt out of trustee revisions, if trustees have the power to "adopt" any uncurated packages.
Thanks to this question, I just realized that I think I got something wrong in the proposal. The reason it opts-out of revisions is that packages in the uncurated index do not include revisions. However, the implication in the proposal is that all packages in the uncurated index do not include revisions. This is wrong, I think. Rather, the uncurated index should be [curated with revisions + uncurated without revisions]. And the curated index should be [curated with revisions + adopted uncurated with revisions].
I think oppositely: uncurated index should be truly "uncurated", i.e. no revisions at all.
Good properties are:
- Metadata of packages in uncurated index is the same as in the tarballs. This means that "uncurated = completely unmutated".
- Then actions of Hackage Trustees won't interfer with e.g. Stackage curation.
Alternatively, technically elegant solution is to have three separate disjoint indices:
- uncurated (
x-curated: False
) - curated first upload
- (curated) revisions
In the current version of proposal we have Uncurated = 1 + 2, Curated = 2 + 3. If someone would need "curated with revisions + uncurated without revisions", that would be everything 1 + 2 + 3
Note: "curated with revisions + adopted uncurated with revisions" is 2 + 3.
UI-idea: as Hackage lists all versions of package, than non-curated versions can be in differenct color or italic, so they will differ from curated ones. The same way deprecated & preferred are highlighted.
first about if "uncurated" will have a sort of inevitable gravitational pull (the tendency of the universe towards chaos, entropy consuming all things, etc.). I was hoping adoption would stem that. But I would really like current hackage trustees themselves to weigh in, since they have the best feel for the dynamics of the ecosystem as it stands.
I think, we should ask a maintainers of "5% (650)" central packages you (@gbaz) identified, how they will behave if this proposal is implemented?
Some authors (including me) have upper bounds on all dependencies, but I personally don't (AFAIK) maintain (so I can say package should have upper bounds) anything which is very close to the dependency root of Hackage.
Some authors don't want to have upper bounds unconditionally on every dependency, but aren't against revisions (like https://github.com/haskell-infra/hackage-trustees/issues/125#issuecomment-358230908), and I think Hackage Trustees can also in the future work with them, to have their packages in the curated index.
Hackage Trustees (though I speak only for myself) can handle the workload of adopting some versions of the rest of closure, if there are maintainers will be completely against making x-curated: True
releases themselves.
Curated packages cannot depend on uncurated packages
[...]
Hackage trustees will recognize and respect the uncurated flag, and not contact those who set it with any issues.
[...]
Uncurated packages may be "adopted" into the curated ecosystem by trustees.
What about the following scenario: I want my package to have good bounds and I would be happy to be contacted about issues, but I can't because one of my dependencies is uncurated and I don't have the energy to get it adopted. Later on, somebody does get that dependency adopted; I would like to be contacted so I can now make my package curated as well, but I won't be because the fact that my package is uncurated-despite-my-best-wishes will be interpreted as please-don't-contact-me!
Furthermore, I think wishing-to-be-contacted is a per-maintainer decision, not a per-revision decision. So it should be possible to change this decision when a package changes hands, or when a maintainer changes their mind. It also has no impact on building packages, so there's no reason to associate it with a version, a revision, or to keep it immutable. Maybe it could be a mutable flag on our Hackage profile page?
@bergmark writes
I'm a bit confused on what x-uncurated means. Is it that a release is currently only part of the uncurated index but can be revised by a trustee to be curated, or is it an opt out for maintainers to say that a package will never be apart of the curated index?
It means the former, which is what you were hoping for :-)
@gelisam raises a very good question:
What about the following scenario: I want my package to have good bounds and I would be happy to be contacted about issues, but I can't because one of my dependencies is uncurated and I don't have the energy to get it adopted. Later on, somebody does get that dependency adopted; I would like to be contacted so I can now make my package curated as well, but I won't be because the fact that my package is uncurated-despite-my-best-wishes will be interpreted as please-don't-contact-me!
This is a very good point. I do think the information should remain per-package at least for the following reason however -- it may be you have stuff you care about and want curated, and you also have stuff you don't want curated and don't care about at all (e.g. point-in-time research artifacts).
Anyway, I think there's a good solution, here coming from the comment from @tomjaguarpaw about boolean blindness. We can have x-curation: curated
(by default), x-curation: uncurated
, x-curation: uncurated-adoption-sought
and x-curation: uncurated-no-contact
or the like. Obviously the flag names need work. This also gives a great onramp to users -- the transitive check can suggest the adoption-sought
flag, and trustees can look to fix up the blockers.
I want to give a few days for the discussion here to proceed (especially for those that don't consider discussing package management their idea of weekend fun :-P). Following that, I'll revise the proposal to try to take a lot of the important ideas and points raised here into account.
Thanks for writing this up @gbaz. I'd like to focus on how this is intended to interact with Stackage. I believe the proposal is designed to account for desires of Stackage as a downstream consumer, which some commenters here may not be aware of, but the proposal seems to be gearing towards.
Some authors of packages in Stackage wish to opt-out entirely from revisions and requirements of maintaining PVP bounds. Stackage would like to respect their wishes, and not pull in revisions that these authors do not agree to. (In some cases, such revisions have introduced unnecessarily strict upper bounds, causing authors to need to un-revise those bounds or upload new versions.) At the same time, Hackage Trustees wish to be able to provide revisions for consumption by cabal-install's dependency solver.
One response to all of this would be for Stackage to simply ignore all revisions, what I believe is known as the rev0 proposal. This unfortunately has its own limitations:
- Packages which follow the PVP from rev0 would then retain unnecessarily strict upper bounds for Stackage's usage, since Stackage would not receive the revisions which relax their upper bounds
- There would be significant user and author confusion by Stackage and Hackage having completely differing views of the world. It would be completely understandable for a Stackage user to file a report against a rev0 view of the cabal file, and an author to not understand why the user does not see the revisions already made.
As I understand it, this is the logic behind your comment above, which seemed to cause confusion:
Rather, the uncurated index should be [curated with revisions + uncurated without revisions]. And the curated index should be [curated with revisions + adopted uncurated with revisions].
In this world, Stackage would use the uncurated index as its upstream. Authors who have opted out of curation will have no revisions from trustees appear in Stackage, as they desire. However, authors who choose to introduce PVP-style bounds and then relax them with revisions will not be asked by Stackage curators/users to make a separate package upload to relax version bounds.
That's at least my understanding of this design. Have I read too much into it?
A few other questions which I think haven't been asked above, apologies if they're repeats:
- Should uncurated packages block author revisions to avoid confusion? Otherwise, we could end up in the same situation where an author uploads a package with restrictive bounds but opts out of curation, then uses revisions to relax the bounds for cabal-install, but Stackage does not see those changes.
- What is displayed as far as version information on the Hackage page? A large concern raised throughout these designs is the problem of each ecosystem displaying conflicting information. It would be ideal if the uncurated packages showed the rev0 information by default, perhaps with a link to display revision information.
- There's a greater question that all of this begs, which is whether we'd all be better off if Stackage ignored version bounds entirely. If I'm not mistaken, many people reviewing the initial SLURP proposal brought up this idea. On the one hand, I hesitate to even raise this idea here, since it's fairly tangential. But on the other hand, such a decision could have a major impact on design of a feature like this. I'll try to do a write-up this week of the arguments I heard for changing Stackage, and perhaps that discussion will shed light on this proposal.
Again, thank you for taking the ball on this proposal.
I don't know the rational for Stackage not ignoring version bounds. Has this been discussed somewhere before that someone can link me to?
Something I've started wondering about, is what the actual difference is between making revisions to fix bounds, and simply making new standard releases with the same fix.
The biggest difference would be that the old "unrevised" versions of packages would still be visible to cabal-install
, since all the revisions are equally unprivileged versions. That would mean if X depends on Y, a new Y breaks X, and X is updated to lock Y out of its bounds, the cabal-install
might happily continue to pick the old X that works with the new Y (as far as its declared version bounds are concerned).
What revisions gain us over the situation above is really only that when a revision to a package version is released, earlier revisions of the same package version are hidden from cabal install. This is where the "mutable" concern comes from; I can install the same version on two different days and get different results.
So what if instead of revisions, we had the ability to mark particular package versions as "subsumed" by later versions. And cabal-install
would only choose a subsumed version if it was specified explicitly (or perhaps only if there are no non-subsumed versions in the version bounds?). You could even add an extra sub-point-release field to the version number (or equivalently, consider the existing revision numbers to be part of the version number).
Then all the individual releases are explicitly immutable. When you leave your version numbers unconstrained (partially or wholly) you see a "mutable" view of Hackage with new revisions appearing unpredictably (exactly as currently happens with point releases if you're using PVP bounds). But if you want to pin exact version numbers you can guarantee that that won't happen to you, because you can pin the revisions as will with the same system you use to pin version numbers (bounds in the .cabal file or cabal-install
command line). And the version numbers you get when downloading from Stackage would tell you exactly what revision on Hackage would give you the same contents.
The "do I allow my package to be curated or not" decision would then be recast as "do I allow people other than me to make new releases to fix version bounds problems".
It seems like this basically is the system we already have, only because revision numbers are an extra system separate from ordinary version numbers, you can't handle them the same way. There's no way that I know of to even address a revision from a cabal file, even though you can through Hackage. If we could unify revisions and ordinary releases (and add some stuff about granting permission for the trustees or other people to create only revision-like releases), how far would that go towards reducing the friction between people who want there to be revisions and people who don't?
I don't know the rational for Stackage not ignoring version bounds. Has this been discussed somewhere before that someone can link me to?
No, it's never been discussed. The tl;dr is that it could lead to a lot of author confusion and frustration that they're receiving bug reports about dependency versions they aren't ready to start supporting. But I do intend to write this theory up correctly and let others comment there.
So what if instead of revisions, we had the ability to mark particular package versions as "subsumed" by later versions.
One concrete proposal I've seen in the past is a small tweak to the cabal-install dependency solving which ignores all but the most recent patch release when doing dep solving. This means that, if I've released version 1.4.3.2 of a package, and we discover later that it has incorrect bounds information, I could release 1.4.3.3. Existing build plans which have hard-coded to 1.4.3.2 would be unaffected, but anyone running the dependency solver in the future will automatically ignore 1.4.3.2. I believe this would also have a positive performance impact, but automatically shrinking the search space.
@snoyberg Yes, that sounds pretty much what I'm talking about.
So my question is, if we had that tweak to cabal-install and allowed a wider set of people than the original author to make patch releases (the Hackage trustees, to be close to the current situation), how close would that get us to addressing the concerns that the curated/uncurated split is seeking to address?
So my question is, if we had that tweak to cabal-install and allowed a wider set of people than the original author to make patch releases (the Hackage trustees, to be close to the current situation), how close would that get us to addressing the concerns that the curated/uncurated split is seeking to address?
The uncurated layer is not addressed only (or primarily) to the issue of revisions. It is about allowing packages to be "opted out" more generally from promises as to stability, maintenance, etc. Many people that today do not want to participate in revisions would tomorrow not want to interact with extra-patch-releases. From my standpoint, I don't consider ideas along those lines as tackling the same problem, and wouldn't consider them relevant to this discussion.
What revisions gain us over the situation above is really only that when a revision to a package version is released, earlier revisions of the same package version are hidden from cabal install.
I've already heard variants of your suggestion and unfortunately it doesn't meet all requirements... Revisions are quite important cost-wise especially in cabal's nix-style model, as a new version forces inevitably forces a new Nix-style hash + invaliding the existing cache and thus force recompilations; whereas things that can be done via a revision don't force invalidation of existing valid install-plans (this is the killer feature of revisions over reified versions). IOW, new versions inevitably grow the configuration space and thrash the Nix-style caches (which would also make http://matrix.hackage.haskell.org/'s goal less feasible). Moreover, new versions also come with a greater storage cost & bandwidth, as the meta-data is orders of magnitude smaller than the .tar.gz
associated and don't benefit from delta/dictionary-compression like .cabal
file in the package index do.
As such revisions are very clever tool and optimisation we have since around mid 2014 (I started making my first revisions in June 2014, and revisions started bearing visible fruits very soon in the pursuit to improve UX for Hackage/Cabal users) and for which I see no good replacement (which doesn't regress in some way relative to the status quo), as the actual benefits for me significantly outweigh the perceived downsides.
things that can be done via a revision don't force invalidation of existing valid install-plans
I don't want to derail this discussion too much, but this statement shocked me! Based on what I know, it seems like revisions would be forced to invalidate caches. I say that because it's possible (see acme-mutable-package) to change how a package behaves using only revisions. I don't think that's likely, but the fact that it can happen at all suggests that revisions need to bust caches.