Strings cannot contain literal tab characters
There's a tab character in the string below, if Chrome and/or github didn't strip it out.
define constant $string = " ";
Compiling it results in
Invalid token beginning """ encountered.
test-parser.dylan:372: -
test-parser.dylan:373: define constant $string = " ";
test-parser.dylan:374: -
Changing the tab to spaces fixes the problem.
I guess this is technically valid, and I certainly wouldn't ever put a literal tab in a string constant on purpose, but it's damned hard to debug since the tab is invisible. I think we should have a better error message here.
You can fix this by altering sources/dfmc/reader/lexer-transitions.dylan to either include the invalid characters and then detect them in make-string-literal (and issue an error via note(...)) or have the lexer transition to something that does the error notification.
I suspect that the former is probably easier?
The part of lexer-transitions.dylan that you want to look at is this one:
state(#"double-quote", #f,
pair('"', #"string"),
pair('\\', #"double-quote-escape"),
pair(" !#-[]-~", #"double-quote"),
pair($ascii-8-bit-extensions, #"double-quote")),
state(#"string", make-string-literal),
state(#"double-quote-escape", #f,
pair("\\'\"abefnrt0", #"double-quote"),
pair('<', #"double-quote-escape-less")),
state(#"double-quote-escape-less", #f,
pair("0-9a-fA-F", #"double-quote-hex-char-digits")),
state(#"double-quote-hex-char-digits", #f,
pair("0-9a-fA-F", #"double-quote-hex-char-digits"),
pair('>', #"double-quote")),
A user just ran into this problem. I'll submit a PR for it at some point but there are some notes in the meantime....
- It seems it could be useful to make the lexer accept an
otherwiseclause in each state. It will be slightly annoying to have to enumerate the set of characters that are illegal in#"double-quote-escape"for example. - Why is Tab illegal inside strings? It seems perfectly valid, if somewhat error prone. (Python drops tabs silently! Go allows them.)
- I think explicitly transitioning to an error state in the lexer would be best because it will (I believe! test it) give a more precise location for the error.
The short answer to "why is Tab disallowed" is "because the DRM says so". I would support issuing a warning but accepting the string as valid.