reference icon indicating copy to clipboard operation
reference copied to clipboard

Grammar for "identifier or keyword" seems wrong

Open chichid opened this issue 2 years ago • 4 comments

Based on the grammar reference for identifier The following '_' identifier should be invalid, since:

_ is not XID_Start XID_Continue* and it's not '_' XID_Continue+ either (it's not followed by at least one XID_CONTINUE)

I think the rule should be:

IDENTIFIER_OR_KEYWORD :
      (XID_Start | '_') XID_Continue*

OR---

IDENTIFIER_OR_KEYWORD :
       XID_Start XID_Continue*
     | '_' XID_CONTINUE*

chichid avatar Jul 06 '22 15:07 chichid

_ is indeed meant to not be a valid identifier. It is categorized as punctuation instead. As such

IDENTIFIER_OR_KEYWORD :
     XID_Start XID_Continue*
   | _ XID_Continue+

really is correct.

bjorn3 avatar Jul 06 '22 16:07 bjorn3

bjorn3 hmmm, I did this playground, and it seems at least the proc-macro APIs parse "_" as an ident. So either the reference is wrong, or the Proc-Macro API is.

The same thing can be verified locally with proc-macro instead of proc-macro2. But you'll need to define a macro.

https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=345fe7f26edd72f2c8a95b1c7fd74535

image

chichid avatar Jul 06 '22 16:07 chichid

Looks like the lexer indeed considers _ to be an identifier and the parser considers it to be a keyword. (All keywords are identifiers as far as the lexer is concerned.)

IDENTIFIER_OR_KEYWORD :
      (XID_Start | '_') XID_Continue*

is closest to the actual logic the lexer implements.

bjorn3 avatar Jul 06 '22 18:07 bjorn3

I'm not sure how best to document this, but I think one option is have it be something specific to proc-macros. _ is very carefully not an identifier in the rest of the reference because of how it interacts with everything else (like patterns, macro_rules, etc.). One thought is to keep proc-macro specific documentation in the proc-macro API docs (for example, Ident could more clearly document that _ is considered a keyword, as it already mentions its behavior around keywords). But then we would need to consider what happens if a proc-macro generates a token stream with that identifier, which I think might end up as a keyword?

So maybe _ could be changed to be a keyword? That would be a bit unusual since it more closely matches punctuation, though I suppose the distinction isn't all that important.

ehuss avatar Jun 27 '23 03:06 ehuss