- lexer.rl: handle CLRF as a line separator

Open iliabylich opened this issue 1 year ago • 1 comments

Closes https://github.com/whitequark/parser/issues/1020.

3.3.1 :004 > Parser::CurrentRuby.parse("1\r\n2\r\n3").children[2].loc
 => #<Parser::Source::Map::Operator:0x00000001222f0f80 @expression=#<Parser::Source::Range (string) 6...7>, @node=s(:int, 3), @operator=nil>

3.3.1 :005 > Parser::CurrentRuby.parse("1\r\n2\r\n3").children[2].loc.expression.source
 => "3"

A few notes:

If \r\n is a line separator parser still emits tNL token with location of the \n character
tSTRING_CONTENT tokens now have proper locations, but the content doesn't include \r part of \r\n (because eval(%{"\r\n"}) is just "\n"), so .source of their locations doesn't match string content. I guess it's fine, the same happens with all escape sequences anyway.

If it doesn't break Rubocop's test suite I guess it's safe to merge it as is.

@kddnewton Could you take a look at this please? Does it fix Prism's translator?

Jun 08 '24 06:06 iliabylich

This gets close, but runs into issues with escaped \r and literal \n then getting grouped, as in:

<<EOS
foo\rbar
baz\r
EOS

(There are regular newline characters after each line.) In this PR, it groups that last \r\n because the gsub is happening after escape sequences are resolved.

Jun 10 '24 12:06 kddnewton