Consider NFC normalization of idents?
Raised by @Jules-Bertholet in https://github.com/rust-lang/rust/issues/120697
It could be desirable to NFC-normalize identifiers so as to remove these kinds of inconsistencies.
However, it seems plausible that this may cause issues if the identifiers are not local, e.g. they come from another crate that does not run rustfmt.
It may make sense, however, if rustc itself linted against non-NFC-normalized idents?
this may cause issues if the identifiers are not local, e.g. they come from another crate that does not run rustfmt
I don't see how?
@Jules-Bertholet Wait, maybe my brain is just slow (I'm overdue for lunch...), does rustc fully NFC-normalize idents internally when resolving them?
@Jules-Bertholet Wait, maybe my brain is just slow (I'm overdue for lunch...), does rustc fully NFC-normalize idents internally when resolving them?
Yes.
Oh, no issue then! In fact it would help motivate rustc linting since then we could just expect people to run rustfmt.
@Jules-Bertholet @workingjubilee Do either of you know if there are crates that already implement the normalization?
Rustc uses unicode-normalization:
https://github.com/rust-lang/rust/blob/0d531351e848ad69a03c704d40985c9003847427/compiler/rustc_parse/src/lexer/mod.rs#L771-L780
@Jules-Bertholet Ahhh nice! Very good to know that something like this already exists. Thanks for the link.
However, it seems plausible that this may cause issues if the identifiers are not local, e.g. they come from another crate that does not run rustfmt.
Just want to make sure I understand correctly. Neither of you think this would be an issue?
Just want to make sure I understand correctly. Neither of you think this would be an issue?
Correct, AIUI. Even if a dependency uses non-NFC idents internally, they can be referred to by dependents with their NFC forms.
Just want to make sure I understand correctly. Neither of you think this would be an issue?
Yes. I was a bit slow on the uptake and hadn't remembered yet that we NFC idents (because of course we should). Applying NFC on format means that the visual appearance of the normalized identifiers will not be changed as a result.
Thank you both for the confirmation. In that case, I think we could add an opt-in configuration option to normalize idents.
I'm pretty sure all idents get rewritten in rewrite_ident:
https://github.com/rust-lang/rustfmt/blob/ead0fc9529ef787bc0e3f7f1b92dccd00cf45915/src/utils.rs#L27-L29
Probably makes sense to return a Cow<'a str> in case we don't need to normalize anything. If either of you (or anyone else) wants to work on this you can find docs for adding new configuration options here, and I'd be happy to review a PR.