opendylan icon indicating copy to clipboard operation
opendylan copied to clipboard

Strings cannot contain literal tab characters

Open cgay opened this issue 12 years ago • 4 comments

There's a tab character in the string below, if Chrome and/or github didn't strip it out.

define constant $string = " ";

Compiling it results in

Invalid token beginning """ encountered.

test-parser.dylan:372:                           -
test-parser.dylan:373: define constant $string = "  ";
test-parser.dylan:374:                           -

Changing the tab to spaces fixes the problem.

cgay avatar Mar 17 '13 22:03 cgay

I guess this is technically valid, and I certainly wouldn't ever put a literal tab in a string constant on purpose, but it's damned hard to debug since the tab is invisible. I think we should have a better error message here.

cgay avatar Mar 17 '13 22:03 cgay

You can fix this by altering sources/dfmc/reader/lexer-transitions.dylan to either include the invalid characters and then detect them in make-string-literal (and issue an error via note(...)) or have the lexer transition to something that does the error notification.

I suspect that the former is probably easier?

The part of lexer-transitions.dylan that you want to look at is this one:

       state(#"double-quote", #f,
             pair('"', #"string"),
             pair('\\', #"double-quote-escape"),
             pair(" !#-[]-~", #"double-quote"),
             pair($ascii-8-bit-extensions, #"double-quote")),
       state(#"string", make-string-literal),
       state(#"double-quote-escape", #f,
             pair("\\'\"abefnrt0", #"double-quote"),
             pair('<', #"double-quote-escape-less")),
       state(#"double-quote-escape-less", #f,
             pair("0-9a-fA-F", #"double-quote-hex-char-digits")),
       state(#"double-quote-hex-char-digits", #f,
             pair("0-9a-fA-F", #"double-quote-hex-char-digits"),
             pair('>', #"double-quote")),

waywardmonkeys avatar Mar 18 '13 02:03 waywardmonkeys

A user just ran into this problem. I'll submit a PR for it at some point but there are some notes in the meantime....

  • It seems it could be useful to make the lexer accept an otherwise clause in each state. It will be slightly annoying to have to enumerate the set of characters that are illegal in #"double-quote-escape" for example.
  • Why is Tab illegal inside strings? It seems perfectly valid, if somewhat error prone. (Python drops tabs silently! Go allows them.)
  • I think explicitly transitioning to an error state in the lexer would be best because it will (I believe! test it) give a more precise location for the error.

cgay avatar Apr 16 '23 16:04 cgay

The short answer to "why is Tab disallowed" is "because the DRM says so". I would support issuing a warning but accepting the string as valid.

cgay avatar Apr 16 '23 16:04 cgay