reference
reference copied to clipboard
Grammar for "identifier or keyword" seems wrong
Based on the grammar reference for identifier The following '_' identifier should be invalid, since:
_ is not XID_Start XID_Continue* and it's not '_' XID_Continue+ either (it's not followed by at least one XID_CONTINUE)
I think the rule should be:
IDENTIFIER_OR_KEYWORD :
(XID_Start | '_') XID_Continue*
OR---
IDENTIFIER_OR_KEYWORD :
XID_Start XID_Continue*
| '_' XID_CONTINUE*
_
is indeed meant to not be a valid identifier. It is categorized as punctuation instead. As such
IDENTIFIER_OR_KEYWORD :
XID_Start XID_Continue*
| _ XID_Continue+
really is correct.
bjorn3 hmmm, I did this playground, and it seems at least the proc-macro APIs parse "_" as an ident. So either the reference is wrong, or the Proc-Macro API is.
The same thing can be verified locally with proc-macro instead of proc-macro2. But you'll need to define a macro.
https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=345fe7f26edd72f2c8a95b1c7fd74535
Looks like the lexer indeed considers _
to be an identifier and the parser considers it to be a keyword. (All keywords are identifiers as far as the lexer is concerned.)
IDENTIFIER_OR_KEYWORD :
(XID_Start | '_') XID_Continue*
is closest to the actual logic the lexer implements.
I'm not sure how best to document this, but I think one option is have it be something specific to proc-macros. _
is very carefully not an identifier in the rest of the reference because of how it interacts with everything else (like patterns, macro_rules, etc.). One thought is to keep proc-macro specific documentation in the proc-macro API docs (for example, Ident could more clearly document that _
is considered a keyword, as it already mentions its behavior around keywords). But then we would need to consider what happens if a proc-macro generates a token stream with that identifier, which I think might end up as a keyword?
So maybe _
could be changed to be a keyword? That would be a bit unusual since it more closely matches punctuation, though I suppose the distinction isn't all that important.