ecosystem-proposals icon indicating copy to clipboard operation
ecosystem-proposals copied to clipboard

SLURP: a Single Liberal Unified Registry of Haskell Packages

Open simonmar opened this issue 7 years ago • 240 comments
trafficstars

NOTE: the proposal is currently "parked" pending discussions of alternatives, see this comment.

Rendered proposal

Hackage has been extraordinarily successful as a single repository through which to share Haskell packages. It has supported the emergence of variety of tools to locate Haskell packages, build them and install them (cabal-install, Stack, Nix, ...). But in recent years there has been increasing friction over,

  • Hackage’s policies, especially concerning version bounds;
  • Hackage's guarantees, especially around durability of package content and metadata;
  • Hackage's features, especially the visual presentation and package documentation.

If we do not resolve this friction, it seems likely that the Haskell library ecosystem will soon “fork”, with two separate repositories, one optimised for Cabal and one for Stack. This would be extremely counter-productive for Haskell users.

Thus motivated, over the last few months we have talked a lot to colleagues, including ones in the Hackage and Stack communities. We have emerged with SLURP, a proposal that could go a long way towards supporting the upsides of a diverse ecosystem, without the sad downsides of forking into mutually-exclusive sub-communities.

Here is the SLURP proposal. We invite the Haskell community to debate it.

SLURP is meant to enable both Hackage and Stackage (and perhaps more services in the future) to in the future make choices autonomously without hurting other package services. But it will only work if the implementors of both Hackage and Stackage are willing to participate. We respect their autonomy in this matter, but we urge them to give this proposal serious consideration in the best interests of the community and Haskell's success. We have carefully designed SLURP to be as minimal and non-invasive as possible, so that it can be adopted without much trouble. Of course, we are open to debate about the specific details.

We do have an offer from someone willing to implement SLURP.

We also strongly urge members of the community to express clear views about the importance --- or otherwise --- of adopting something like SLURP. You are, after all, the community that GHC, Hackage, Stackage, Cabal, etc are designed to serve, so your views about what best meets your needs are critically important.

Mathieu Boespflug (@mboes) Manuel Chakravarty (@mchakravarty) Simon Marlow (@simonmar) Simon Peyton Jones (@simonpj) Alan Zimmerman (@alanz)

simonmar avatar Jan 22 '18 10:01 simonmar

To be honest, I'm rather sceptical about this proposal being effective at achieving the stated goals. As such, I'm not convinced enough of its merits to support this.

hvr avatar Jan 22 '18 10:01 hvr

perhaps we could somehow trial this thing by running it in reverse: SLURP is created and exists soley as a mirror of, say, hackage and stackage. it somehow watches them and inputs new packages immediately; then we can see if anyone actually adopts it and finds it useful?

silky avatar Jan 22 '18 11:01 silky

To correct a terrible mistake:

In Hackage, for example, the presence of revisions may mean it will not.

That's not true. The tarballs aren't ever mutated on Hackage. To apply a revision client needs to consider Hackage's index (01-index.tar).

EDIT Check https://github.com/haskell-infra/hackage-trustees/blob/master/revisions-information.md for more information about revisions.


I don't see technical reason not to make SLURP tarballs immutable. What's the additional metadata and how it's fetched is up to particular repository (Hackage revisions, Stackage snapshots, ...)


I have to reread the proposal to get proper understanding of it to comment more.

phadej avatar Jan 22 '18 11:01 phadej

You can think of SLURP as an authoritative URL-shortening service for packages.

No, please. This implies moving from a single point of failure with high pressure for availability (hackage) to multiple points of failure with availability discovery by chance.

If you say “federalized and decentralized”, this is exactly how to not build a federalized/decentralized system.

Semi-related opinion: Uniformity of data is actually a good thing.

Profpatsch avatar Jan 22 '18 12:01 Profpatsch

-1

It is unfortunate what is happening. This will not solve it. It will make it worse.

tonymorris avatar Jan 22 '18 12:01 tonymorris

After some thought, I feel as if this proposal at once goes too far and not far enough.

The key element to this proposal is to retain a single, authoritative mapping of package names (+versions) to sources. But this is actually a pretty strong guarantee which is already somewhat violated. For example, nix does patch some (few) Haskell packages in order to be compatible with its scheme. By keeping the single authoritative source, we don't work around issues such as that arising with cassava recently, or other issues where some changes in a package itself might cause issues with a particular distribution.

At the same time, allowing hackage to become just "one-of-many" would make it difficult to satisfy some of the goals set out here. If some critical package (either current or future) decides to move off of Hackage as their authoritative source, say, then the part of the ecosystem depending on that package becomes unavailable to cabal unless it moves over to using slurp directly, which would be a much bigger change, or unless Hackage ends up hosting a version of said package anyway, which would violate the slurp guarantees.

On the other hand, the more I think about it, the more a "soft fork" seems like not so much of a bad thing. If stackage moved to a model where, by default, packages come from Hackage but could be overridden on a case-by-case basis, then it's not clear that too much harm or fragmentation would result. Obviously the nightmare case is that there's a divergence in some core package, but this case seems pretty unlikely, since it benefits nobody. On the other hand, say, if two parsers happen to differ in such a way as to cause differing results on a package, temporarily forking that package until the parsers can be updated seems to be no bad thing. Again, this mirrors much of what already happens in nix or in any linux distribution, where packages are taken from upstream by default, but can be overridden if needed.

Obviously, one could still have the case where critical packages moved to being hosted elsewhere and thus became unavailable to cabal, but this would seem less likely since Hackage would still be considered the "central" package source, if no longer the completely authoritative one.

nc6 avatar Jan 22 '18 12:01 nc6

You make the following assertion:

If we do not resolve this friction, it seems likely that the Haskell library ecosystem will soon “fork”, with two separate repositories, one optimised for Cabal and one for Stack.

Could you detail why you think this is the case? As far as I understand, Stackage is just a subset of hackage. Stackage requires maintainers to get their package working with a particular snapshot, but hackage on the other hand just leaves it down to maintainers to try their best and if packages don't work together you can either try --allow-newer or submit a patch.

I think the current arrangement works well. Stackage for more mature packages you're prepared to put the extra work into keeping up to date with the ecosystem and Hackage for more experimental packages. You can always depend on a Hackage package in a Stackage package by just explicitly mentioning the Hackage package version.

Have I missed something about how this works? I don't see Hackage and Stackage as competitors, but completely complementary parts for slightly different levels of package maturity.

clintonmead avatar Jan 22 '18 12:01 clintonmead

I might be missing the point here. But...

One seriously great thing about both Stackage and Hackage is the documentation both provide. Packages basically all get documentation in the same familiar format, link into eachother happily via types, module links, functions etc., we even get lovely links to source files, and even lovelier links between source files! This is something I don't want to give up.

How will this work under SLURP, where I assume docs are neither guaranteed (if the canonical home for a package is just e.g. some guy's github repo) nor guaranteed to have a similar structure? It seems inevitable, if SLURP is the canonical home for packages, that it (or a hypothetical SLURPage) will be the canonical "giver of links" to the homepages of packages, and that it may as well generate API docs while it's at it.

mikeplus64 avatar Jan 22 '18 13:01 mikeplus64

FYI, Mathieu Boespflug's handle is @mboes.

tomjaguarpaw avatar Jan 22 '18 13:01 tomjaguarpaw

This seems like an unnecessary complication in the ecosystem. I agree with the above comment that this multiplies our single point of failure.

Furthermore I don't see how SLURP will achieve its goals. It seems to me that implementing this proposal will cause the fork it aims to prevent.

gwils avatar Jan 22 '18 13:01 gwils

Besides the indiviual review points, I have an overall question:

I am not sure how it solves the problem. Both hackage and stackage will offer functionality beyond what is defined by SLURP, which is required by the corresponding tools to work (Stackage: the package sets; Hackage: the version bounds). But this means that stack resp. cabal-install will not actually be able to use SLURP, they will only be able to use the packages provided by their corresponding hosting-service. But if I want to provide package foo for users of both tools, I now have to upload it to both services. But SLURP allows the name foo to only point to one of them. So I will have to upload foo-stackage and foo-hackage? Clearly that is not the desired state of affairs … where did I make a mistake?

nomeata avatar Jan 22 '18 14:01 nomeata

:+1: I am strongly in favor of this proposal. I think it will allow all parties (Stackage, Hackage, Nix, and others) to continue improving independently without stepping on each other's toes or compromising the uniqueness of Haskell package names.

tfausak avatar Jan 22 '18 14:01 tfausak

I like this proposal: I think that a common namespace for Haskell packages is a thing that we definitely want in a shared ecosystem. This is a minimal proposal to get it.

A.

However, I do think that @nomeata has a good point: It may happen that a particular name is specific to one hosting-service and not useable by others — if the common package metadata is not enough for some hosting-service to provide their service. I don’t think that this problem can be avoided if we want to allow experimentation in metadata.

It is still better than a fork, because in this proposal, the package name stays unique, at least. (In a fork, we could have to very different packages with the same name.)

But I concur that a way to offer multiple variants of a single name for different hosting providers would be nice.

A2.

As I understand, the proposal does not actually allow experimentation in metadata. In this case, I think that we need to make the metadata specification a part of the proposal. The specification linked in the proposal is not good enough, because it is out of date. De facto, Cabal-the-spec is implicit in Cabal-the-library. The main problem with this is that it leaves open the question of “Who is allowed to change the common Spec in the future?”. Right now, the answer is “The people with push privileges to Cabal-the-library”. If the proposal wants to specify a common package metadata, then it also needs to address the question of ownership of the metadata format, thereby actually defining what common means.

B.

On point 6.4: “Name squatting”: I do not think that allowing PUT without authentication is a good way to solve this. Fortunately, this not about any core problem that the proposal wants to address, but it is about automated protection from SPAM. There are various tools for that, like user authentication, or CAPTCHAs. A very simple solution would be to allow PUT only from trusted hosting-services like Stackage and Hackage.

HeinrichApfelmus avatar Jan 22 '18 15:01 HeinrichApfelmus

There seems to be a contradiction in these two statements:

bytestring-0.10.8.2 is bytestring-0.10.8.2 no matter where you downloaded it from today (which might be a different place from where you'll download it tomorrow).

we do not impose that getting a particular package tarball for (say) pkg-3.5.3 always yields the same result each time

tomjaguarpaw avatar Jan 22 '18 15:01 tomjaguarpaw

@nomeata thanks for the review. Will respond here to the top-level comment.

I am not sure how it solves the problem. Both hackage and stackage will offer functionality beyond what is defined by SLURP, which is required by the corresponding tools to work (Stackage: the package sets; Hackage: the version bounds).

Hackage is currently a package store, plus a bunch of other things (documentation server, downloads counter, ratings tracker, etc). Mind you, there are plenty of other package stores too, but using them exclusively is currently problematic. For example, GitHub is a package store: it's possible to list versions and to download tarballs of releases associated with repositories. All other public instances of hackage-server are also package stores. And I have heard of several plans for more in the future.

Stackage is not exactly a package store: one way to see it is as a mirror of a controlled subset of Hackage. Packages aren't (currently) uploaded directly to Stackage.

Where does SLURP fit in? Well, if people are going to introduce their own public package stores (as the Simon's are saying will likely happen), then we need to make using these other package stores not problematic. I won't comment on these future package stores, so let's go back to the GitHub-as-package-store example: what's the problem with only hosting package tarballs directly on GitHub? It's that if tools start pulling from this package store directly, in addition to Hackage, then there is no point of coordination for choosing who owns what package name. That is to say, user A could upload package foobar to GitHub, even have it be included in future Stackage snapshots say, while user B might concurrently upload foobar to haskell.hackage.org. Without SLURP, user A and user B had no way to coordinate about package names and we end up with both user A and user B vying for the foobar name.

You might say, "yes but today whoever uploaded first to Hackage is whoever owns the name foobar". That's true. But if tomorrow a significant part of the community no longer does that (and this is not a theoretical concern), then we have a problem on our hands and introducing something like SLURP makes that scenario less problematic.

How would services like Stackage deal with multiple package stores the future? Just like it does now with Hackage: users send a pull request saying which new package to include in Stackage and Stackage mirrors (an immutable copy of) that, just like it does now). Hackage could do the same as well, by including in the 00-index.tar.gz file the .cabal metadata for all packages registered in SLURP, irrespective of their location. Or cabal-install could be updated. In sum, no need for uploading foo-hackage here and foo-stackage there.

SLURP focuses on one problem and one problem only: coordinating package name ownership and making sure the owner of package name is advertising available versions in a standard way.

mboes avatar Jan 22 '18 15:01 mboes

SLURP focuses on one problem and one problem only: coordinating package name ownership and making sure the owner of package name is advertising available versions in a standard way.

Aha! Then I think you should make this point upfront in big bold letters (metaphorically) because currently the proposal seems to be a lot more wooly than that.

tomjaguarpaw avatar Jan 22 '18 15:01 tomjaguarpaw

If we do nothing it seems likely that the Stack community will create a separate Haskell package repository.

It would be nice if this was supported with some justification. At the moment SLURP looks like a solution in anticipation of a problem which may never occur.

tomjaguarpaw avatar Jan 22 '18 15:01 tomjaguarpaw

SLURP focuses on one problem and one problem only: coordinating package name ownership and making sure the owner of package name is advertising available versions in a standard way.

Is a possible name collision really the biggest problem that we are facing in light of a possible ecosystem split? (and are we not rather making such splits more likely?)

Here is a hypothesis: Every tool that will start pulling from more sources than just hackage (or just stackage) will just use full URLs to disambiguate the packages. Or come up with some other solution to distinguish https://hackage.haskell.org/package/foobar from https://github.com/alternative/foobar. Both stack.yaml files and caba.project files already support that – with the intention of supporting development versions, the infrastructure seems to be there already.

I am not sure if we are solving the right problem, and if there is a problem in the first place.

nomeata avatar Jan 22 '18 15:01 nomeata

@HeinrichApfelmus

The specification linked in the proposal is not good enough, because it is out of date. De facto, Cabal-the-spec is implicit in Cabal-the-library. The main problem with this is that it leaves open the question of “Who is allowed to change the common Spec in the future?”. Right now, the answer is “The people with push privileges to Cabal-the-library”.

Yup, and this is an important problem. Thanks for bringing it up. This proposal, however, takes things one step at a time. For now, Cabal-the-spec = Cabal-the-library as you say, for lack of anything better. If some folks could get together to propose a new working group or somesuch to keep Cabal-the-spec up-to-date, then that would be fantastic. It's just that changing what Cabal-the-spec means is out of scope for this particular proposal. In this proposal we just assume the status quo. If the meaning of Cabal-the-spec somehow changes in the future, then SLURP trustees will enforce that instead of the current specification-by-implementation.

mboes avatar Jan 22 '18 15:01 mboes

If we do nothing it seems likely that the Stack community will create a separate Haskell package repository.

Why would the stack community want to create a separate Haskell package repository?

I think this proposal is lacking motivation whilst the elephant in the room is not addressed.

mpickering avatar Jan 22 '18 16:01 mpickering

No tool knows how to use SLURP, nor will immediately know how to in the future. This means that tools will need to change to use SLURP regardless. If tools need to change, there is an easier way to change that involves no creation of any new upstream system, management of its hosting and backup, provisioning of security systems at all:

Simply allow namespacing prefixes to refer to different package registries, and allow cabal files (and stack.yaml files, etc) to declare which prefixes associate to which registries.

Mock syntax:

repo-aliases: foo-repo=https://foo.com/repo, bar-repo=https://bar.com/repo

build-depends: foo-repo+some-package, bar-repo+some-package

This proposal seems to introduce many confusing problems and solve no important ones.

gbaz avatar Jan 22 '18 16:01 gbaz

So, on reflection there's an aspect of this that I missed. I still have the above concerns, but this might overrule them.

Thinking about name authority, there are two separate issues:

  • Deliberate overriding of the package under a given name to deal with distribution issues. This should be allowed and supported.
  • Accidental name clash following a hard fork (e.g. multiple actual repositories as storage locations, rather than persistent overlays).

I was largely thinking about the former situation, which it reads a bit like this proposal aims to prevent. But perhaps it's the latter situation. So if there is likely to be a hard fork, and if we want to stick with un-namespaced packages (at least for the time being) I can see the merit in this proposal.

nc6 avatar Jan 22 '18 16:01 nc6

On a personal note, the process of arriving to this state has been frustrating. I put in enormous amounts of time on a prior iteration of this proposal that was utterly different than this one, and we arrived at a situation where there was no objection (among the 25-ish people cc-d on two different threads) to something that proposed to avoid a fork by improving and modifying the candidate mechanism to serve an "uncurated" purpose (through a new layer) and could lever existing tech at every step rather than introducing a new component.

The last message in that discussion, roughly 20 days ago, seemed positive on the direction that had gone.

Now, without, as best as I can tell, most of any of the people in those discussions having been followed up with, we have an entirely different proposal, sharing the same acronym, whose purpose is not to avoid a fork, but to facilitate one. In this proposal is no discussion or mention of the fact that a prior proposal, that seemed just fine and more straightforward had ever been considered, nor what people consider may have been wrong with it.

This is not how to coordinate a discussion, and it leaves a very bad taste in my mouth.

gbaz avatar Jan 22 '18 16:01 gbaz

Can a specific example of needing to put something on Stackage that cannot go on Hackage be added? That is what I understand the motivation of this proposal to be, if I have read between the lines correctly. It would strengthen the proposal to be more explicit.

chreekat avatar Jan 22 '18 16:01 chreekat

The main problem with this is that it leaves open the question of “Who is allowed to change the common Spec in the future?”. Right now, the answer is “The people with push privileges to Cabal-the-library”.

This proposal, however, takes things one step at a time. For now, Cabal-the-spec = Cabal-the-library as you say, for lack of anything better. [...] If the meaning of Cabal-the-spec somehow changes in the future, then SLURP trustees will enforce that instead of the current specification-by-implementation.

Sure, I can see that using Cabal-the-spec = Cabal-the-library for the time being is an acceptable solution. But it still needs to be recorded in the proposal itself! At a minimum, the sentence

"If the meaning of Cabal-the-spec somehow changes in the future, then SLURP trustees will enforce that instead of the current specification-by-implementation."

needs to be included in the proposal. My point is that it's not so much a matter of "What is the spec?" but "Who decides what the spec is?". The latter question has to be answered as part of the proposal text, otherwise the spec is effectively undefined / subject to control outside of this proposal.

(To facilitate discussion about this, it may make sense to remove the name "Cabal" from one of the concepts. For instance, the format could be called "Common Haskell package metadata format" instead of Chabal-the-spec, or the tool for installing packages from Hackage could be called hackage install.)

HeinrichApfelmus avatar Jan 22 '18 16:01 HeinrichApfelmus

@nomeata

Or come up with some other solution to distinguish https://hackage.haskell.org/package/foobar from https://github.com/alternative/foobar. Both stack.yaml files and caba.project files already support that – with the intention of supporting development versions, the infrastructure seems to be there already.

Yes, at the cost of not being able to combine both in the same project (that can become increasingly difficult when you start having 200+ transitive dependencies). Further, requiring ancillary data not part of the commonly agreed core to disambiguate. cabal-install uses cabal.project files, and Stack uses stack.yaml files, but neither interoperates with the other. Further still, if a package changes location (say from Hackage to GitHub or vice versa), then all this metadata, whether it's in the de-facto-standard .cabal files or in the non-standard cabal.project file (or stack.yaml), will need to be updated everywhere. That's why a key principle stated in the proposal is independence from provenance:

Package names should not depend on their provenance: bytestring-0.10.8.2 is bytestring-0.10.8.2 no matter where you downloaded it from today (which might be a different place from where you'll download it tomorrow).

That's why there's a difference between nicely supporting temporary overrides for in-development packages, and nicely supporting packages hosted in multiple places (not just Hackage) quasi-permanently.

Disambiguating names might seem like a small service to offer. It can nonetheless be of crucial importance. SLURP is a proposed means to do so in a way that requires little to no changes to existing tooling (and small changes to existing platforms). And the implementation effort is commensurately very small (including while preserving the security properties you seek - topic for another comment).

mboes avatar Jan 22 '18 16:01 mboes

If the main reason for SLURP is that Stackage people want to fork Hackage, it's a bit strange that they are not among the proposal's authors.

23Skidoo avatar Jan 22 '18 16:01 23Skidoo

Yes, at the cost of not being able to combine both in the same project (that can become increasingly difficult when you start having 200+ transitive dependencies).

Is that true? The multi-source-aware-buildtool should be able to use appropriate -this-unit-id flags when building the name clashing packages to encode the source repository?

If I am right, then it is technically possible to combine multiple packages with the same name from different sources, and it presumably can be left to innovative service-and-tool-authors (rather than a rigid standard) to make this practically usable.

Or maybe I am not seeing how users, respectively the user's tools (stack, cabal) are going to make use of SLURP. Will the really use the SLURP API? Or will they continue to use their own hosting service (since they presumably rely on the additional metadata there)?

nomeata avatar Jan 22 '18 16:01 nomeata

I like this idea. So, let me see if I can summarize the proposal in my own words:

SLURP will basically be like DNS of Haskell packages (but an append-only registry) and the maintainers of SLURP will be analogous to ICANN. You query SLURP and get a URL pointing to the code you need, while Cabal continues to serve as the meta data protocol for constructing build plans, and you can use whatever build-planning app you like most (Cabal-install, Stack, etc.) to execute the built plans.

While I do like this idea of SLURP, I would prefer you flesh-out the details of the namespaces, which I think will be absolutely essential. I think each namespaces should be maintained by a group of people with common interest in maintaining the quality of that namespace.

Each namespace should function as a sort of publishing house, the maintainers of which can dictate their own rules for quality control. Namespaces, like org.haskell/package, org.haskellstack/package, should be considered official and may not be changed without consensus from the community at large. Whereas (for example) a university or government organization, e.g. uk.ac.gla/package or ch.cern/package, may create their own namespace with their own separate peer-review process for submitting packages to this namespace. A community of game developers may restrict anyone but their own members from publishing a package to that namespace.

The packages within the "experimental" namespace should function as Hackage does right now, but maybe could be ordered into a second level of sub-namespaces, like experimental.network/package and experimental/parsers/package to alleviate further naming conflict.

Perhaps also there could be a limit of two levels of nesting of sub namespaces to prevent abuse of this function.

This is good because it will organize Haskell software into truly a universal naming scheme that will function more like a Linux package repository (without hosting the actual packages).

RaminHAL9001 avatar Jan 22 '18 16:01 RaminHAL9001

For everyone wondering why Stackage might "fork" and introduce its own package namespace, it's answered in the first paragraph of the proposal:

Hackage has been extraordinarily successful as a single repository through which to share Haskell packages. It has supported the emergence of variety of tools to locate Haskell packages, build them and install them (cabal-install, Stack, Nix, ...). But in recent years there has been increasing friction over,

  • Hackage’s policies, especially concerning version bounds;
  • Hackage's guarantees, especially around durability of package content and metadata;
  • Hackage's features, especially the visual presentation and package documentation.

In short:

  • Hackage suggests following the PVP, which some package maintainers disagree with.
  • Hackage allows packages to be revised, which some package maintainers want to avoid.
  • Hackage builds and presents documentation, which some package maintainers want to do themselves.

If you were a package author/maintainer that wanted to follow SemVer, preferred immutable packages, and built your own documentation, Hackage does not benefit you and in fact may make your life harder. You could avoid uploading to Hackage altogether, but then your package name is not guaranteed to be unique. SLURP would allow you to avoid Hackage and still have a unique package identifier.

tfausak avatar Jan 22 '18 16:01 tfausak