parser
parser copied to clipboard
highlight line should handle case where size == 0
Reproduction instructions:
irb(main):003:0> Parser::CurrentRuby.parse('foo++')
(string):1:6: error: unexpected token $end
(string):1: foo++
(string):1:
Traceback (most recent call last):
...
Note the blank line where there should be either a caret or a tilde indicating the spot of the error.
The code in question is in Parser::Diagnostic::render_line.
I'd submit a pull request, but it isn't clear to me whether the code should be putting out a caret or a tilde in this case. I have verified that column_range.begin_pos is the spot immediately after the end of the line in this case.
$end is a special token that has begin = end = EOF, this is why it has zero source range.
Some other languages also don't render it:
$ node -e 'foo**'
[eval]:1
foo**
SyntaxError: Unexpected end of input
at new Script (vm.js:88:7)
at createScript (vm.js:263:10)
at Object.runInThisContext (vm.js:311:10)
at Object.<anonymous> ([eval]-wrapper:10:26)
at Module._compile (internal/modules/cjs/loader.js:1138:30)
at evalScript (internal/process/execution.js:94:25)
at internal/main/eval_string.js:23:3
$ perl -e '++'
syntax error at -e line 1, at EOF
Execution of -e aborted due to compilation errors.
but some languages do:
$ python -c 'foo++'
File "<string>", line 1
foo++
^
SyntaxError: invalid syntax
$ echo "class ++" > test.java && javac test.java
test.java:1: error: <identifier> expected
class ++
^
test.java:1: error: reached end of file while parsing
class ++
^
2 errors
I think it's ok to change it. A few options:
- we could emit
$eoftoken with a EOF...EOF+1 range, but I can't estimate implications of this change. Potentially it can break libraries that depend onparser. - we could check for
error_token_id == 0inParser::Base#on_errorand if it's zero add +1 tolocation.end, but I guess it's not a part of the Racc stable API, i.e. it can change anytime. - we could emit a different error message for
$eoftoken, but it seems to be a breaking change.
Option 1 seems to be the best if nobody relies on this behavior. @mbj @palkan do you remember any places in your code using it? For rubocop we'll know it from CI 😄
@rubys Feel free to send a PR (if not I'll do it myself in a few days). lexer.rl should return [ false, [ '$eof'.freeze, range(eof, eof + 1) ] ] in #advance method, also it can break some unit tests, so expectations should be updated too.
I actually don't believe that any change to the tokens currently emitted is needed. At the moment, render_line is called with the following values:
range = #<Parser::Source::Range (string) 5...5>
range_end = false
Given this information, all that needs to be decided is:
- what character to use. either caret and tilde seem to be reasonable choices
- what position to place this character. For this input, either column 5 or column 6 (counting the first character as one) seem to be reasonable choices. I'd actually lean towards column 6 given that the current output contains the following line:
(string):1:6: error: unexpected token $end
do you remember any places in your code using it?
Nope 🙂
@rubys Yes, but then how can you know when this custom rule should be applied? If we apply it to all zero-length ranges we can easily miss some bugs for other diagnostic messages. Same for EOF...EOF ranges in general. This additional logic should be applied only to unexpected token $eof error.
what character to use. either caret and tilde seem to be reasonable choices
I'd use caret.
what position to place this character. For this input, either column 5 or column 6 (counting the first character as one) seem to be reasonable choices. I'd actually lean towards column 6 given that the current output contains the following line:
Agree, so for foo++ we should get
foo++
^