Before the lexical grammar it might be worth specifying how a source file is decoded from bytes to code-units.
For example, the Go spec mandates UTF-8
Source code is Unicode text encoded in UTF-8.
Maybe at the top of ¶ Syntax