rfcs icon indicating copy to clipboard operation
rfcs copied to clipboard

RFC: Packages as (optional) namespaces

Open Manishearth opened this issue 2 years ago • 149 comments

:desktop_computer: Rendered :desktop_computer:


This was previously discussed as a pre-RFC here, and later iterated upon in this repo.

There is a prototype of this RFC available on https://cratespaces.integer32.com/ with instructions here

This RFC brings forth a long-discussed and desired feature: some kind of namespacing for crates.io, achieved via treating "toplevel" packages themselves as namespaces, with ownership being inherited.

Please read before participating

I'm going to include some of the same disclaimers I included in the pre-rfc:

The general topic of namespacing is one that has been discussed many times. In my understanding, a lot of the tension here comes from people having different problems they wish to solve, and these discussions not always acknowledging that, leading to people feeling unheard.

This proposal comes out of many years of one on one discussion, reading threads, and incubation. I do think this is a viable and good path forward, but for this to work out we need to be our best selves here.

I would like to request everyone to keep this discussion constructive and respectful. If a discussion is getting super involved, it may be worth opening a discussion issue on https://github.com/Manishearth/namespacing-rfc/ and linking to it instead.

This RFC is attempting to solve the problem of making crate ownership clear in terms of other crates; it is not attempting to solve the general problem around the squatting of root crate names. As such, if a crate can be published today under a particular name by anyone, it would also also be publishable under the same name by anyone in the world of this RFC proposal. I would prefer if we did not spend much time discussing what I see as the radically different problem of squatting in general.

Thank you all in advance for what I hope will be a constructive and fruitful RFC!

Update: Discussions about separator choice are being conducted here.

Manishearth avatar Mar 09 '22 22:03 Manishearth

🤔... semi-serious idea: perhaps backslash could be used in code: foo\bar. That doesn't even tokenize at the moment, so it's fully backwards compatible. Too bad neither URLs not toml supports those.

steffahn avatar Mar 09 '22 23:03 steffahn

Thank you for preparing this! I'd like to second the support for ::: as both cargo and path syntax. And maybe even go as far to push back on the idea that it is confusing (or more confusing than other options). "This same thing looks different in cargo.toml than in use statements" is a larger source of confusion for me and my lizard brain than most other things. I may be missing some expertise for the true confusion of :::

jklamer avatar Mar 09 '22 23:03 jklamer

I don't have time to look through the RFC this moment, but I'm hoping this will be forward compatible with intra-package libraries? I described my thoughts a bit in this issue.

jhpratt avatar Mar 10 '22 00:03 jhpratt

@jhpratt i think so

Manishearth avatar Mar 10 '22 00:03 Manishearth

Cargo and rustc have different roles (bounded contexts) and I find it incredible we keep trying to shoehorn the same design into both.

Cargo must use full paths to namespaced crates for obvious reasons. However, this isn't the right design for use within rust code:

  • A single project only has a tiny subset of the entire crates eco-system. The chances of a name conflict are slim to none.
  • It's highly unlikely a user would need two competing implementations from two orgs within the same project.
  • Adding some sort of separator mapping to code adds redundant complexity (magic) to an already complicated language and would likely introduce edge cases - what if a crate name itself contains an underscore?
  • Duplicating the full crate namespace for each use statement is redundant verbiage. If the user wants to move a crate to another namespace, they'll need to touch many places in the code instead of just a single line in the cargo.toml configuration.

To summarise, cargo should be the sole manager of this concern and this should not bleed into code. I should be able to add a dependency of "foo/bar/baz" to cargo, and expect to use "baz" as the crate name in rust code. In the unlikely event There is a name conflict, I should be able to rename the crate in cargo config - that should be an already existing feature of cargo.

For example, a company might choose to have a naming scheme like: "Company/product/libfoo" "Company/product/bar" And say bar depends on libfoo.

What happens if the company decides to rebrand "product"? Seems to me this is redundant busy-work to go and update all references to libfoo when this could have been a simple config file change.

Final edit: To preempt possible "OSS" objections to this "comercial" use case as invalid:

  • Enterprise adoption is already happening and is crucial to Rust's success
  • this is as much about the implementation of cargo as it is about crates.io and cargo already supports 3rd party registries in stable.

yigal100 avatar Mar 10 '22 00:03 yigal100

@yigal100 I think you have misunderstood the problem people are attempting to resolve by making the syntax the same in Cargo and rustc.

I'm also going to somewhat preemptively ask you to tone it down a bit. Using language like "I find it incredible that..." is feigning surprise and is in general not a constructive way to engage. Consider that you may have misunderstood, or that the people you are discussing with hold different values. If you find this pushback to be harsh, bear in mind that this topic has an extremely hard time being discussed, and even minor unconstructive comments can and will snowball. Either people are on their best behavior or this discussion goes nowhere for the millionth time, and I truly wish for the former.

I am super sympathetic to the goal of separating rustc and Cargo here (indeed, the current RFC draft does not propose any rustc changes, just talks about them in the alternatives: I share this goal to some degree!), however I urge folks to be constructive when talking about it.

To address some points you brought up specifically

What happens if the company decides to rebrand "product"? Seems to me this is redundant busy-work to go and update all references to libfoo when this could have been a simple config file change.

This already exists. You can set the name Cargo passes down to rustc.

  • A single project only has a tiny subset of the entire crates eco-system. The chances of a name conflict are slim to none. It's highly unlikely a user would need two competing implementations from two orgs within the same project.

The argument is more about the different syntaxes being confusing (people already have trouble with - and _). At no point is it said that conflicts are the problem (conflicts are resolved with the fact that you can set the name rustc sees in Cargo.toml), you are addressing a straw man here. Conflicts are talked of as a problem only inasmuch as it enables dash typosquatting, where malicious people will take foo-bar and hope that people looking for foo/bar will sometimes make a mistake.

Duplicating the full crate namespace for each use statement is redundant verbiage. If the user wants to move a crate to another namespace, they'll need to touch many places in the code instead of just a single line in the cargo.toml configuration.

This argument already applies to renaming crates in general; and is not a new problem. Furthermore, this RFC doesn't quite propose any form of "moving" crates between namespaces in the first place. Finally, as already mentioned, there are ways to handle crate renames in just Cargo.toml.

Manishearth avatar Mar 10 '22 00:03 Manishearth

@Manishearth:

What happens if the company decides to rebrand "product"? Seems to me this is redundant busy-work to go and update all references to libfoo when this could have been a simple config file change.

This already exists. You can set the name Cargo passes down to rustc.

We're in agreement here as I've mentioned the same point myself. My comment is really about the default behaviour. Defaults matter a lot for ergonomic reasons and the best default here is to map to the leaf crate name and not pass along the full path under some transformation.

As you noted yourself, the current scheme is already confusing people with '-' vs. '_' and imho we shouldn't add to that an additional transformation.

yigal100 avatar Mar 10 '22 01:03 yigal100

I consider removing the root crate name to also be a transformation that can potentially be confusing.

A thing that I might not have mentioned in the RFC (but I'll edit it in now, it'l llargely have the text I'm putting below) is that there are, broadly speaking, two ways this feature may be used.

One of the ways is for one organization (say, unicode) to release a lot of crates under the namespace as asserting organizational ownership. Think unicode/segmentation, unicode/line-breaking, unicode/script, etc. In such a case, using just the leaf crate name makes a lot of sense.

However, there's another case: where a project wishes to use namespaces to talk about a related set of crates. For example, a very common use case I see is things like serde/derive and icu/provider, where "derive" and "provider" on their own are rather generic and would be prone to clashes (what derive? provider of what?)

It's still a valid design to just pick leaf nodes and ask people to rename, however. I'll mention that in my edit, I feel like I had talked about this at some stage of the discussions but it may not have made it into the RFC.

Manishearth avatar Mar 10 '22 01:03 Manishearth

@yigal100 I've added a section. I do somewhat like the leaf approach and it was closer to my original idea but I found in feedback that people who wanted to use it for the "project" use case did not quite enjoy it. But it's now an explicit section where this topic can be discussed further.

Manishearth avatar Mar 10 '22 01:03 Manishearth

A minor note on the question of the separator:

Using :: there not only aligns well with existing Rust syntax generally, but I think also aligns well with with 2018 edition use path changes. While in the 2015 edition the crate root is the top-level :: namespace, the 2018 edition pushes the current crate down a level into crate:: to make room for other crates to live next to it. Using :: as the separator would take this one step further, and allow that new top level namespace to contain package groups as well as individual crates.

Long term, migrating to a separator that is the same in Rust source code might even reduce some of the confusion around hyphens- crates that use them for namespacing would no longer need any crate name translation from Cargo.

rpjohnst avatar Mar 10 '22 01:03 rpjohnst

Agreeing with @rpjohnst, my opinion on how to handle use without syntax changes would be something along the lines of this:

  • Check for a namespace first and prioritize it, ie the namespaced serde::derive takes priority over the root namespace serde::derive in the serde crate
  • Reverse this logic and prioritize root namespace first if the use is prefixed by a ::

So,

use serde::derive;   // resolves to namespace "serde", crate "derive"
use ::serde::derive; // resolves to root namespace, crate "serde", module "derive"

Ultimately I see no better way to resolve this without parser/syntax changes, which could require another Rust edition if I'm not mistaken?

Absolucy avatar Mar 10 '22 02:03 Absolucy

use serde::derive;   // resolves to namespace "serde", crate "derive"
use ::serde::derive; // resolves to root namespace, crate "serde", module "derive"

There are scenarios where people would want to refer to module "derive" of crate "serde" without the leading ::. Especially in marcos. If you look at clap, we do this exact thing in order to let clap be dependency of another library.

pksunkara avatar Mar 10 '22 03:03 pksunkara

Would folks proposing :: be willing to come up with a proposal for :: that includes all the different aspects that need to be addressed? I'd also love to handle it with :: but I'm worried about confusion and ambiguity; and in the past I recall us having a hard time coming up with something that worked with ::.

Manishearth avatar Mar 10 '22 03:03 Manishearth

If separators are going to be a major discussion I'm also happy to direct people to https://github.com/Manishearth/namespacing-rfc/issues/1 and/or https://github.com/Manishearth/namespacing-rfc/issues/2 and request in-depth discussion occur there.

Manishearth avatar Mar 10 '22 03:03 Manishearth

@Manishearth I posted a sketch of a proposal for :: at https://github.com/Manishearth/namespacing-rfc/issues/1#issuecomment-1063749649 . Happy to continue there.

joshtriplett avatar Mar 10 '22 07:03 joshtriplett

Yeah, let's consolidate discussion on separator choice there. I've updated the RFC with links to the issue, further comments about separator choice may be hidden and redirected there instead (until we reach consensus there).

Manishearth avatar Mar 10 '22 07:03 Manishearth

I will declare myself as a very strong opponent to namespace packages at the same time I understand that there is very strong support for namespace packages. Please take my comments here as input to what I believe the RFC is currently not capturing which summarizes my general concern with namespace packages.

The entire premise and idea of a namespace package is that you can "trust" a package to belong to the org it's scoped under. It in a way achieves this by the package index (in this case crates) enforcing this type of ownership on the index side. However by doing so it also crates a general compatibility hazard as part of the ownership conversation is now in the name of the package.

In particular a package might want to move in and out of a scope over time as ownership changes. For instance a package might be at one point independently maintained, then moves under an org (say tokio) just to be recognized later as no longer being a core component and having to move out again. In all these cases you now end up with one of the following options:

  1. every dependency needs to change with the move of the package
  2. the strong "membership of org" relationship on the index gets weakened by again giving individual access to packages
  3. the index learns in some sense about redirects of packages to new locations

This is even worse for when organizations that control multiple packages become abandoned and disband. For instance at one point I and Georg Brandl released all our open source work together under a "Pocoo" org. However as time went on and we had different plans in live, we figured it's better for us to go separate ways. As such Sphinx and Pygments for instance became independent packages maintained by different people, and my libraries went under a new org called Pallets which now maintain many of these. At the same time some of my packages moved into yet another org because there was no longer such a strong desired to keep these around under the same branding.

All of this is to say: organizations as part of the package names are getting very tricky as time goes on and the concept of ownership and maintainership shifts. What instead I wish cargo would adopt is the general idea that we could have organization on crates.io with trusted signatures. That way we as users could start trusting orgs instead independent of what packages are named, rather than trying to shoehorn this into the naming scheme. For instance I would love if I could trust the Tokio folks to maintain packages, they are very trustworthy. But as a user I do not care what they name their packages. They could give it a UUID for all I care.

This is a long way to say that this RFC does not actually address what this is trying to solve other than to appeal to some general "of course we need namespace packages" consensus that has been growing over the years. I believe by disregarding that there are some benefits to namespace packages the discussion has been completely poisoned into a very binary "yes or no" to this topic, rather than trying to find optional solutions for the particular problems that namespace packages are supposedly solving.

mitsuhiko avatar Mar 10 '22 12:03 mitsuhiko

In particular a package might want to move in and out of a scope over time as ownership changes. For instance a package might be at one point independently maintained, then moves under an org (say tokio) just to be recognized later as no longer being a core component and having to move out again. In all these cases you now end up with one of the following options:

1. every dependency needs to change with the move of the package

2. the strong "membership of org" relationship on the index gets weakened by again giving individual access to packages

3. the index learns in some sense about redirects of packages to new locations

If we do continue forward with the namespacing, we should allow flagging a crate as being a successor to another crate so tools (cargo upgrade, dependabot, etc) can recommend migrating to the new crate. For example, if we make clap-serde official as clap/serde, then we should be able to mark it as the upgrade path. In practice, we are hitting this right now with trying to find ways to tell structopt users that clap is the upgrade path. We've even considered doing a breaking release that is purely meant to break people's builds just so they'll get the message when going through the standard upgrade procedures.

epage avatar Mar 10 '22 13:03 epage

@epage i'd quite like this; but I think the design space for that is pretty large (in particular, there's a difference between "renaming a crate" and "moving development effort into a different crate that may do more stuff (like structopt -> clap)", so I'm hoping that can be a separate RFC (that I don't have to write). But I am trying to keep the door open for that in this RFC!

Manishearth avatar Mar 10 '22 17:03 Manishearth

@mitsuhiko As mentioned in the RFC and earlier in this thread, broadly speaking, there are two use cases for this. There is the "organization" use case where an "organization" wishes to talk about ownership for related but seemingly disjoint projects. This use case does have the problems you speak of.

However, there is also the "project" use case, where something that is logically a single piece of software is being developed as multiple crates. It's quite rare for things to want to move out of a project. To me, this is the primary use case of this RFC, though I'm aware other people have different desires.

Overall I would caution against looking at this feature as if it were just like GitHub organizations. In particular, I often see this feature being applied to crates that are developed in the same repository. I'm happy to update the RFC to this effect if you feel this is not being captured well; I attempted to do so in an earlier commit already and I can expand on that.

other than to appeal to some general "of course we need namespace packages" consensus that has been growing over the years

I strongly oppose this characterization; I'm sure the motivation could be expanded upon but there is no text to this effect in this RFC.

Manishearth avatar Mar 10 '22 17:03 Manishearth

I'm happy to update the RFC to this effect if you feel this is not being captured well; I attempted to do so in an earlier commit already and I can expand on that.

@Manishearth though I'm not who you are replying to, I would appreciate updating it. As I read it, ownership and the associated trust for that ownership is the primary motivation (e.g. "Regardless, it is nice to have a way to signify "these are all crates belonging to a single organization, and you may trust them the same"). I feel like narrowing the scope to "logically a single piece of software" (e.g. serde and serde-derive) has a major effect on the dialogue.

epage avatar Mar 10 '22 18:03 epage

Fwiw the "single piece of software" will be wholly covered by the proposal I linked to earlier. I'm hoping to provide additional clarity on this point soon.

jhpratt avatar Mar 10 '22 18:03 jhpratt

Please don't change directory structure of the crates index. Storing namespaced crates under their parent's directory is neat, but needlessly adds more directory traversal logic.

I suggest a rule like "replace all separator characters with 🚲 characters, and then generate the directory prefix as usual", so e.g. "a/b" would be "a@b" and land in "3/a@b", and "ab/cd" be "ab@cd" => "ab/@c/ab@cd". While this may look silly, it preserves the existing directory splitting logic.

kornelski avatar Mar 10 '22 18:03 kornelski

@jhpratt I don't think it does, your proposal there seems to be more for internal crates; this is not about internal crates; this is about multiple crates in a single project. This is also about crate names, not coherence (names only matter for non-internal crates).

Proposals like the one you've posted there (allow publishing multiple crates that are versioned together with relaxed coherence) have been discussed in the past and is something I think various team members are definitely in favor of. Overall I consider such changes to not address the needs here (around crate naming) but I do think such proposals pair well with this one.

Manishearth avatar Mar 10 '22 18:03 Manishearth

@kornelski let's hold off on discussing the exact representation in the registry trie until https://github.com/Manishearth/namespacing-rfc/issues/1 is resolved. The direction that discussion is going might very well render this moot.

Manishearth avatar Mar 10 '22 18:03 Manishearth

@epage I've added a clearer motivation

Manishearth avatar Mar 10 '22 18:03 Manishearth

@Manishearth thanks for the update!

It still feels a bit unclear though. Let's explore a concrete example. Currently in the clap workspace, we have

  • clap
  • clap_derive
  • clap_complete (depends on clap)
  • clap_mangen (depends on clap)

External to clap, we have

  • clap_serde (third-party)
  • clap_verbosity_flags (maintained by me)
  • ...

What would be the expectation or guidance on when something is the same "project" vs being the same "organization'. breaking the above crates down:

  • clap + clap_derve seems like an obvious case for same project
  • clap_complete, clap_mangen, and clap_serde all have a tight coupling to clap and any design decision made in clap has direct ramifications on these. These seem like they'd be part of the same project but we have the potential for clap_serde and crates like it to come and go from an ownership perspective.
  • clap_verbosity_flags just exists and clap changes have little impact on it except major breaking changes, so I'm assuming this would just be part of the "organization".

If my gut feel on clap_serde is correct, that the guidelines suggest we could bless it as clap/serde and could later demote it again to clap_serde, then it feels like the current motivation still runs into same problem as "organizations". I worry the only way we can avoid it is if we narrow the scope down to just clap + clap_derive.

epage avatar Mar 10 '22 18:03 epage

@epage in general i think i would include "does it make sense for this crate to live elsewhere" in my personal test for whether something should or shouldn't be under the project namespace. For example, the icu repo does contain a bunch of utilities like zerovec but they wouldn't need to exist under icu/. Similarly, servo/layout feels rather different to me from servo/url (useful for many other reasons).

Maybe just clap and clap/derive is the answer for y'all! I don't know, I'm not a clap maintainer, and I think my point here is that it really depends on what your needs are here. I don't know what clap_complete and clap_mangen do; clap_mangen seems like a related tool and I feel like it could live in either place. The goal of this RFC is not to neatly slot everything into a namespace; that's why it's optional, it's not a "problem" if something doesn't manage to slot IMO.

Manishearth avatar Mar 10 '22 19:03 Manishearth

@Manishearth However, there is also the "project" use case, where something that is logically a single piece of software is being developed as multiple crates. It's quite rare for things to want to move out of a project. To me, this is the primary use case of this RFC, though I'm aware other people have different desires.

IMO the "project" use case requires a completely different solution and it does not actually relate to package naming at all. Today the biggest challenge for crates split into dependent crates is largely the tooling to publish, to inter-depend and to hide these internal crates from users. That all seems to be entirely a tooling issue and maybe some extra meta data to "hide" a package. For instance we have a crate called symbolic which is split into many sub-crates. When we orginally crated it we wanted to hide the sub crates entirely and use feature flag to control how much of symbolic is available. We mostly split the crates because of compile time concerns.

I do not see how having a controlled prefix actually helps us in any case here. In fact it's even tricker because the dependencies are entirely unclear. For deser the dependency graph is that deser depends on deser_derive and deser_json depends on deser and deser_derive. If you asked me today to make a decision which of these should be under a prefix I could not tell you. Having to make this decision at the time of naming the crate seems entirely wrong to me. I would like to make this decision at the time of publishing a crate.

I welcome the idea of introducing tools to work with internal dependent crates but I think this requires entirely different solutions than what this RFC proposes.

mitsuhiko avatar Mar 10 '22 19:03 mitsuhiko

@mitsuhiko These are not internal crates. Maybe in the case of serde/derive, but not in the case of most of the other examples I've given.

Manishearth avatar Mar 10 '22 19:03 Manishearth