textmapper
textmapper copied to clipboard
report line and column of syntax error when parsing
To help troubleshoot the cause of a syntax errors when parsing, it would be really helpful if the parser generated by textmapper reported both line and column of syntax errors when parsing.
Currently only the line number is reported, and this creates an additional step for users of the parser to figure out the exact cause of the parse error. One recent such example is https://github.com/llir/llvm/issues/105#issuecomment-559043489, the extract of which is included here for completeness.
By @dannypsnl:
By the way, would you like to use official IR Parser by submodule LLVM-IR-parser and mapping ast only? textmapper didn't provide a reasonable result for parsing error. I didn't know which step it stuck.
All I got was: syntax error at line 1
Here is the test code:
package asm import ( "testing" "github.com/llir/ll/ast" ) func TestParse(t *testing.T) { testCode := `!7 = !DIExpression(DW_OP_LLVM_convert, 16, DW_ATE_unsigned, DW_OP_LLVM_convert, 32, DW_ATE_signed)` _, err := ast.Parse("", testCode) if err != nil { t.Error(err) } }
To get a better parse error, we ended up breaking the line into multiple lines, resulting in syntax error at line 5
, which was more helpful. The same could be achieved with a line:column
pair when reporting syntax errors.
ref: https://github.com/llir/llvm/issues/105#issuecomment-559184390
!7
=
!DIExpression(DW_OP_LLVM_convert,
16,
DW_ATE_unsigned,
DW_OP_LLVM_convert,
32,
DW_ATE_signed
)
Note, this might be me failing to use functionality already included in Textmapper. I simply use the generated parser and don't try to do anything fancy handling errors. So perhaps this information is already available?
Cheers, Robin
I used to have column tracking in parsers written in Java but it turned out we needed it so rarely that I dropped the support for it. Instead, I optionally track the offset of the current line (or rather of the token start line) in the lexer:
tokenLineOffset = true // enables this
Besides, columns are very weird as there are multiple interpretations for this number: bytes, unicode runes, or JavaScript offsets (measured in utf-16 code units). All browsers use the last one.
I can embed the line offset into the SyntaxError type if you need it.
Besides, columns are very weird as there are multiple interpretations for this number: bytes, unicode runes, or JavaScript offsets (measured in utf-16 code units). All browsers use the last one.
This being in Go, I'd probably go with Rune count (where tabs count as a single rune, since the tab width is otherwise subjective).
I can embed the line offset into the SyntaxError type if you need it.
That would be very helpful.