reference
reference copied to clipboard
Clean up and consolidate the lexical specification.
The lexical specification needs some cleanup and organization. Some things I can think of:
- [ ] There should be an overall introduction and overview of the lexical structure.
- [x] Paths are in the Lexical chapter, but I don't think they should be. #937.
- [x] The UTF8BOM/SHEBANG definition is floating in a chapter outside of the Lexical chapter. I think it is relevant to lexing, so it should be somehow incorporated in the Lexical chapter. (Not sure how, probably need to rearrange things a little.) #1459
- [x] I think there should be an appendix consolidating all the Lexer rules blocks. This should be generated automatically. DONE: https://github.com/rust-lang/reference/pull/1787 https://doc.rust-lang.org/nightly/reference/grammar.html#lexer-summary
- [x] The "input format" subchapter is almost completely useless, and could be moved somewhere else. #1459
- [ ] There should be a note about token ambiguity (this can be relatively brief, but should be mentioned). This depends on the lexer/parser implementation. rustc works by splitting tokens into smaller parts. The proc_macro parser works by only issuing the smaller tokens, and using the Spacing to determine if they should be combined later on. The tokens that I'm aware of that cause this issue are:
| Token | Possibly Split Into |
|---|---|
+= |
+ = |
&& |
& & |
|| |
| | |
<< |
< < |
<- |
< - |
>> |
> > |
>>= |
> >= |
>= |
> = |
+= |
+ = |
See also: https://github.com/rust-lang-nursery/wg-grammar/issues/3 https://internals.rust-lang.org/t/pre-pre-rfc-canonical-lexer-specification/4099