rustfmt icon indicating copy to clipboard operation
rustfmt copied to clipboard

Consider NFC normalization of idents?

Open workingjubilee opened this issue 1 year ago • 10 comments

Raised by @Jules-Bertholet in https://github.com/rust-lang/rust/issues/120697

It could be desirable to NFC-normalize identifiers so as to remove these kinds of inconsistencies.

However, it seems plausible that this may cause issues if the identifiers are not local, e.g. they come from another crate that does not run rustfmt.

It may make sense, however, if rustc itself linted against non-NFC-normalized idents?

workingjubilee avatar Feb 06 '24 21:02 workingjubilee

this may cause issues if the identifiers are not local, e.g. they come from another crate that does not run rustfmt

I don't see how?

Jules-Bertholet avatar Feb 06 '24 21:02 Jules-Bertholet

@Jules-Bertholet Wait, maybe my brain is just slow (I'm overdue for lunch...), does rustc fully NFC-normalize idents internally when resolving them?

workingjubilee avatar Feb 06 '24 21:02 workingjubilee

@Jules-Bertholet Wait, maybe my brain is just slow (I'm overdue for lunch...), does rustc fully NFC-normalize idents internally when resolving them?

Yes.

Jules-Bertholet avatar Feb 06 '24 21:02 Jules-Bertholet

Oh, no issue then! In fact it would help motivate rustc linting since then we could just expect people to run rustfmt.

workingjubilee avatar Feb 06 '24 21:02 workingjubilee

@Jules-Bertholet @workingjubilee Do either of you know if there are crates that already implement the normalization?

ytmimi avatar Feb 06 '24 22:02 ytmimi

Rustc uses unicode-normalization:

https://github.com/rust-lang/rust/blob/0d531351e848ad69a03c704d40985c9003847427/compiler/rustc_parse/src/lexer/mod.rs#L771-L780

Jules-Bertholet avatar Feb 06 '24 23:02 Jules-Bertholet

@Jules-Bertholet Ahhh nice! Very good to know that something like this already exists. Thanks for the link.

However, it seems plausible that this may cause issues if the identifiers are not local, e.g. they come from another crate that does not run rustfmt.

Just want to make sure I understand correctly. Neither of you think this would be an issue?

ytmimi avatar Feb 07 '24 02:02 ytmimi

Just want to make sure I understand correctly. Neither of you think this would be an issue?

Correct, AIUI. Even if a dependency uses non-NFC idents internally, they can be referred to by dependents with their NFC forms.

Jules-Bertholet avatar Feb 07 '24 02:02 Jules-Bertholet

Just want to make sure I understand correctly. Neither of you think this would be an issue?

Yes. I was a bit slow on the uptake and hadn't remembered yet that we NFC idents (because of course we should). Applying NFC on format means that the visual appearance of the normalized identifiers will not be changed as a result.

workingjubilee avatar Feb 07 '24 03:02 workingjubilee

Thank you both for the confirmation. In that case, I think we could add an opt-in configuration option to normalize idents.

I'm pretty sure all idents get rewritten in rewrite_ident: https://github.com/rust-lang/rustfmt/blob/ead0fc9529ef787bc0e3f7f1b92dccd00cf45915/src/utils.rs#L27-L29

Probably makes sense to return a Cow<'a str> in case we don't need to normalize anything. If either of you (or anyone else) wants to work on this you can find docs for adding new configuration options here, and I'd be happy to review a PR.

ytmimi avatar Feb 07 '24 22:02 ytmimi