cargo
cargo copied to clipboard
Accept hyphen in crate name in place of underscore
Crates.io currently accepts hyphens for crates that use underscores, both in the web interface and in the API.
Cargo does not, but should.
error: no matching package named `serde-codegen` found (required by `testing`)
location searched: registry https://github.com/rust-lang/crates.io-index
version required: *
Cargo is somewhat agnostic between - and _, but it doesn't consider the two characters equivalent. Crates decide whether they want - or _ to begin with and then they must be referenced through that name, disallowing usage of the other.
I'd personally prefer to not accept both serde-codegen and serde_codegen as it can be confusing to see two different values which mean the same thing in practice from time to time.
Makes sense, and I don't have a strong preference myself. I saw this comment from @steveklabnik and figured this would be a step toward tooling that reflects the conventions we would like developers to use.
Is there a rationale for crates.io vs cargo behaving differently from each other?
I'd personally prefer to not accept both
serde-codegenandserde_codegenas it can be confusing to see two different values which mean the same thing in practice from time to time.
Fair, but I would rather see serde-codegen and serde_codegen and not need to care, vs see library-a and library_b and need to remember which one to use in each case. "They mean the same thing" is a thing you learn once, while library-a and library_b is a thing that will bite you for as long as you use Rust.
I don't really know why crates.io is agnostic, it wasn't originally and I think that was a patch added after the fact, would have to track that down.
Yeah it's easier to not have to remember, but to me it's more of a downside as it's disguising what's actually happening under the hood.
The crates.io change was https://github.com/rust-lang/crates.io/commit/89bc5dd46435ec08113c82266ad16464d68d4710.
I don't really know why crates.io is agnostic, it wasn't originally and I think that was a patch added after the fact, would have to track that down.
The current behavior of crates.io is defined by RFC 940:
Right now, crates.io compares package names case-insensitively. This means, for example, you cannot upload a new package named RUSTC-SERIALIZE because rustc-serialize already exists.
Under this proposal, we will extend this logic to identify - and _ as well.
I'd like to re-open this issue, I think this is a bug and we should fix it. (Maybe we can talk about it in a cargo meeting).
I'd personally prefer to not accept both serde-codegen and serde_codegen as it can be confusing to see two different values which mean the same thing in practice from time to time.
At first glance, this makes sense, but I think it doesn't hold as much water when you consider:
- In practice, the name almost always appears in your Cargo.toml exactly once, so its not like they'll mean the same thing in the same file, just that some crates will write them differently.
- If the name contains
-(which I believe is prefered), both names will already appear, one in the source and one in the manifest, because-is not valid in source.
In contrast, users who accidentally add serde-json always know what they meant, and because of the conversion users are taught that - and _ are interchangeable in package names. In practice, this seems to be a frustrating wart of cargo which is not adding any clarity for people & possibly even confusing them (if they don't guess that the reason adding error_chain failed was that its actually called error-chain).
Reopened. I have also come to feel much more strongly about this in the past year.
I also agree this is worthwhile to fix.
Implementation-wise this won't be easy though, I think, as it'll require changes to the crate index. The changes in Cargo itself after that, though, are likely nominal.
@alexcrichton That would be a problem, why isn't it sufficient to change how we match the dependency name against the index file to be neutral to underscores (e.g. retry if there's no file with the characters swapped)?
(It'd be better if the index were normalized, but that seems quite challenging).
Right now we don't load the entire index in-memory and we currently also don't try to browse the entire index, rather given a crate we drill into exactly which file it's supposed to be. If we have - and _ normalization we'd have a set of filenames that would be the plausible right one, and we'd in theory have to try to check all of them. That's ok on first builds, but we'd need to ensure that if you've got a lock file that this fallback behavior doesn't happen a lot, as it could add up time-wise I think.
Makes sense, so I think what we should do:
- If the index file for the given ident doesn't exist, fall back by substituting underscores/hyphens in the name
- When generating the lock file, be sure to generate it with the index name, not the name in the toml
Is a divergence between the name in the lock and the toml going to be a problem?
EDIT: Also I'd like to write the PR for this to get more acquainted with cargo's codebase :)
Yeah that sounds like it could work!
I think we'll have to maek sure that a Dependency::name isn't compared to a PackageId::name, although we can perhaps either assume that doesn't happen or otherwise use separate types there if necessary. Sounds plausible at least!
If the index file for the given ident doesn't exist, fall back by substituting underscores/hyphens in the name
Is there a better way of doing this then brute force? With out changing the index in ways that brack older cargos / exiting projects?
Brute force take an exponential O(2^n) time, but that’s not really a problem when n is almost never greater than two.
Why not go further and normalise all names to use hyphens -? The first step would be a new version of the query which only returns/finds normalised names; the second step would be a Cargo update to normalise then use the new version. The third step (a bit later perhaps) would be to only show the normalised names on crates.io.
Because the index wuld need to have both names so that pre-normalise cargo and post-normalise cargo can find it, and that makes for 2 sources of truth.
No it wouldn't if deployed via a new version of Cargo. Unfortunately this would not be backwards compatible (i.e. old versions of Cargo would require correct - vs _; new versions could accept either).
A much nicer solution would have been to restrict crate names to a whitelist of characters that contains only one of those two symbols from day 1. Alas, time travel is not really an option.
One thing that could be done to lessen the problem over time would be to stop crates.io allowing new crates with underscores (assuming hyphens are preferred).
I'd personally prefer to not accept both serde-codegen and serde_codegen as it can be confusing to see two different values which mean the same thing in practice from time to time.
Strongly agree! This is the sort of magic renaming nonsense that CSS has to do because they used - everywhere. I think Cargo should issue a warning for new crates that use - in the name.
I've actually wasted 10 minutes on this now because serde-transcode has a hyphen in it and I haven't worked out how to reference that from Rust code yet. You can't do use serde-transcode;. I'm sure there's a workaround but it's an annoying paper cut that I even have to think about it.
If everyone used underscores I wouldn't even have given it a second thought.
@Timmmm agreed that - and _ should not both be allowed and with your logic that _ is preferable (though I care little). However, any solution has to consider renaming existing crates, which is only viable if - and _ are interchangeable in Cargo.toml files and the - → _ rename is allowed on crates.io.
Here's what I would do (if I had infinite time etc.). Maybe some of these have been done already - I'm not sure.
- Add a warning to Cargo for crates with hyphens in the name that new crates with hyphens will not be accepted on crates.io soon.
- A bit later, stop accepting new crates on
crates.iowith hyphens. Also do not accept new underscore crate names that match an existing hyphenated crate. - Find crates that only differ in
_/-. Hopefully there aren't any. If there are, resolve by removing/renaming one of them. - Accept underscores in
Cargo.tomlfor crates that are actually named with hyphens - both for dependencies, and for authors updating existing crates (so they can rename them). - Show every crate name on crates.io using underscores (translate the legacy hyphenated ones).
Lots of work though!
You can't do
use serde-transcode;. I'm sure there's a workaround but it's an annoying paper cut that I even have to think about it.
The "workaround" is incredibly simple: you can do use serde_transcode;, all hyphens are transformed to underscores within Rust source code. If we don't have diagnostics that recommend this when you type use serde-transcode we should.
Since crates.io already does not accept two packages with just - and _ different, why can't we make every place that accepts a package name to silently convert - to _ (or the other way round) internally? It is perfectly feasible that the preference of -/_ of the package author does not affect how users use it (just like whether the package author uses tabs or spaces doesn't affect whether users use tabs or spaces). This can be done with some algorithm similar to how case-insensitive systems handle the cases.
TL;DR: Why do we have to care if ignoring doesn't lead to problems?
In fact, it is even feasible to force change everything existing to - (or to _) and just silently (or with a warning) convert them when new crates are published.
BTW, I'm a bit confused. Are we talking about crate names or package names, or are they the same thing?
TL;DR: Why do we have to care if ignoring doesn't lead to problems?
That is the goal, and we've solved the problem in both rustc and crates.io but not in cargo, which is why this issue is open.
So why is this hard in cargo?
One complication is Alternative Registries, the existing RFC allows registries to have packages that only differ by - vs _. crates.io does not allow this, but other registries can do what they want. If we want to continue to support the RFC, then Cargo needs to keep the names as is and change all equality checks to equivalency checks. Tracking down every time we use eq or hash on a type that contains a name and finding a way to make it equivalency... it is going to be hard. (Also a deep well of corner cases.)
Even if we decide to break such (niche) uses of Alternative Registries, we will want a grace period. Some time where it will build with the wrong -/_ but give you a warning that older cargos won't know what you meant. I think this leads to the same implementation problems.
If someone has a way to make this work, I am open to helping make it happen.
@Eh2406 I found the reason why - / _ is all considered as an underscore here in crates.io. The sql function canon_crate_name replaces the hyphen with an underscore. If the replace function is removed, crates.io will allow registries to have packages that only differ by - vs _.
crates.io will allow registries to have packages that only differ by
-vs_.
I don't think we want crates.io to allow packages that only differ by - vs _ so I've closed the associated PR; let me know if I've misunderstood.
I've posted https://internals.rust-lang.org/t/pre-rfc-unify-dashes-and-underscores-on-crates-io/13216/13. It's becoming more clear to me that this doesn't need an RFC, but I've posted the pre RFC anyway.
I don't actually think the exponential growth for names with a large number of separators is a problem. In those cases we can traverse the index trie, looking for both separators whenever there is one, and splitting the search if both exist. Note that this will not cause exponential growth unless there actually are crates with that combination of separator: it's unlikely that foo-bar-baz-quux-1 and foo-bar-baz_quux-2 and foo-bar_baz-quux-3 and so on all exist (and if they do it's probably an automated publish in violation of crates.io policy). In other words, it's only possible to engineer an exponentially bad situation here on purpose, it's not really possible as an accident.
Strongly agree! This is the sort of magic renaming nonsense that CSS has to do because they used
-everywhere. I think Cargo should issue a warning for new crates that use-in the name.I've actually wasted 10 minutes on this now because
serde-transcodehas a hyphen in it and I haven't worked out how to reference that from Rust code yet. You can't douse serde-transcode;. I'm sure there's a workaround but it's an annoying paper cut that I even have to think about it.
10 minutes?
I am new to rust and have spent the last two hours working out how to, in my lib.rs, import a crate that somebody has named with a hyphen.
I think they named it that way because there's already one with the underscore name and so it is definitely a good way to distinguish between the two versions.