textmapper icon indicating copy to clipboard operation
textmapper copied to clipboard

report line and column of syntax error when parsing

Open mewmew opened this issue 5 years ago • 2 comments

To help troubleshoot the cause of a syntax errors when parsing, it would be really helpful if the parser generated by textmapper reported both line and column of syntax errors when parsing.

Currently only the line number is reported, and this creates an additional step for users of the parser to figure out the exact cause of the parse error. One recent such example is https://github.com/llir/llvm/issues/105#issuecomment-559043489, the extract of which is included here for completeness.

By @dannypsnl:

By the way, would you like to use official IR Parser by submodule LLVM-IR-parser and mapping ast only? textmapper didn't provide a reasonable result for parsing error. I didn't know which step it stuck.

All I got was: syntax error at line 1

Here is the test code:

package asm

import (
	"testing"

	"github.com/llir/ll/ast"
)

func TestParse(t *testing.T) {
	testCode := `!7 = !DIExpression(DW_OP_LLVM_convert, 16, DW_ATE_unsigned, DW_OP_LLVM_convert, 32, DW_ATE_signed)`
	_, err := ast.Parse("", testCode)
	if err != nil {
		t.Error(err)
	}
}

To get a better parse error, we ended up breaking the line into multiple lines, resulting in syntax error at line 5, which was more helpful. The same could be achieved with a line:column pair when reporting syntax errors.

ref: https://github.com/llir/llvm/issues/105#issuecomment-559184390

!7
=
!DIExpression(DW_OP_LLVM_convert,
16,
DW_ATE_unsigned,
DW_OP_LLVM_convert,
32,
DW_ATE_signed
)

Note, this might be me failing to use functionality already included in Textmapper. I simply use the generated parser and don't try to do anything fancy handling errors. So perhaps this information is already available?

Cheers, Robin

mewmew avatar Dec 04 '19 21:12 mewmew

I used to have column tracking in parsers written in Java but it turned out we needed it so rarely that I dropped the support for it. Instead, I optionally track the offset of the current line (or rather of the token start line) in the lexer:

tokenLineOffset = true // enables this

Besides, columns are very weird as there are multiple interpretations for this number: bytes, unicode runes, or JavaScript offsets (measured in utf-16 code units). All browsers use the last one.

I can embed the line offset into the SyntaxError type if you need it.

inspirer avatar Dec 04 '19 21:12 inspirer

Besides, columns are very weird as there are multiple interpretations for this number: bytes, unicode runes, or JavaScript offsets (measured in utf-16 code units). All browsers use the last one.

This being in Go, I'd probably go with Rune count (where tabs count as a single rune, since the tab width is otherwise subjective).

I can embed the line offset into the SyntaxError type if you need it.

That would be very helpful.

mewmew avatar Dec 04 '19 22:12 mewmew