parser icon indicating copy to clipboard operation
parser copied to clipboard

highlight line should handle case where size == 0

Open rubys opened this issue 4 years ago • 4 comments
trafficstars

Reproduction instructions:

irb(main):003:0> Parser::CurrentRuby.parse('foo++')
(string):1:6: error: unexpected token $end
(string):1: foo++
(string):1:
Traceback (most recent call last):
...

Note the blank line where there should be either a caret or a tilde indicating the spot of the error.

The code in question is in Parser::Diagnostic::render_line.

I'd submit a pull request, but it isn't clear to me whether the code should be putting out a caret or a tilde in this case. I have verified that column_range.begin_pos is the spot immediately after the end of the line in this case.

rubys avatar Jan 13 '21 00:01 rubys

$end is a special token that has begin = end = EOF, this is why it has zero source range.

Some other languages also don't render it:

$ node -e 'foo**'
[eval]:1
foo**


SyntaxError: Unexpected end of input
    at new Script (vm.js:88:7)
    at createScript (vm.js:263:10)
    at Object.runInThisContext (vm.js:311:10)
    at Object.<anonymous> ([eval]-wrapper:10:26)
    at Module._compile (internal/modules/cjs/loader.js:1138:30)
    at evalScript (internal/process/execution.js:94:25)
    at internal/main/eval_string.js:23:3

$ perl -e '++'
syntax error at -e line 1, at EOF
Execution of -e aborted due to compilation errors.

but some languages do:

$ python -c 'foo++'
  File "<string>", line 1
    foo++
        ^
SyntaxError: invalid syntax

$ echo "class ++" > test.java && javac test.java
test.java:1: error: <identifier> expected
class ++
     ^
test.java:1: error: reached end of file while parsing
class ++
        ^
2 errors

I think it's ok to change it. A few options:

  1. we could emit $eof token with a EOF...EOF+1 range, but I can't estimate implications of this change. Potentially it can break libraries that depend on parser.
  2. we could check for error_token_id == 0 in Parser::Base#on_error and if it's zero add +1 to location.end, but I guess it's not a part of the Racc stable API, i.e. it can change anytime.
  3. we could emit a different error message for $eof token, but it seems to be a breaking change.

Option 1 seems to be the best if nobody relies on this behavior. @mbj @palkan do you remember any places in your code using it? For rubocop we'll know it from CI 😄

@rubys Feel free to send a PR (if not I'll do it myself in a few days). lexer.rl should return [ false, [ '$eof'.freeze, range(eof, eof + 1) ] ] in #advance method, also it can break some unit tests, so expectations should be updated too.

iliabylich avatar Jan 13 '21 17:01 iliabylich

I actually don't believe that any change to the tokens currently emitted is needed. At the moment, render_line is called with the following values:

range = #<Parser::Source::Range (string) 5...5>
range_end = false

Given this information, all that needs to be decided is:

  • what character to use. either caret and tilde seem to be reasonable choices
  • what position to place this character. For this input, either column 5 or column 6 (counting the first character as one) seem to be reasonable choices. I'd actually lean towards column 6 given that the current output contains the following line:
(string):1:6: error: unexpected token $end

rubys avatar Jan 13 '21 23:01 rubys

do you remember any places in your code using it?

Nope 🙂

palkan avatar Jan 14 '21 11:01 palkan

@rubys Yes, but then how can you know when this custom rule should be applied? If we apply it to all zero-length ranges we can easily miss some bugs for other diagnostic messages. Same for EOF...EOF ranges in general. This additional logic should be applied only to unexpected token $eof error.

what character to use. either caret and tilde seem to be reasonable choices

I'd use caret.

what position to place this character. For this input, either column 5 or column 6 (counting the first character as one) seem to be reasonable choices. I'd actually lean towards column 6 given that the current output contains the following line:

Agree, so for foo++ we should get

foo++
     ^

iliabylich avatar Jan 14 '21 11:01 iliabylich