source_span
source_span copied to clipboard
Computed column numbers are not consistent with editors for surrogate pairs
I tested that with the error reporting of dart-sass, which relies on that package (using SourceFile.fromString).
Giving it this invalid input, it reports an error on the ;:
a {
b: "👭a"(;
}
Error: expected ")".
╷
2 │ b: "👭a"(;
│ ^
╵
test.scss 2:12 root stylesheet
The error is reported as being at column 12. However, all editors I tried (VS Code, Intellij IDEs, gedit) are reporting this ; as being at column 11. They seem to compute columns based on unicode codepoints (or maybe on glyphs), not based on UTF-16 code units.
Yeah this package was written before characters existed and it uses length and codeUnitAt which I think don't handle surrogate pairs, just UTF-16 code units.
@lrhn - would you expect that migrating away from the native String APIs onto characters will solve this?
I think you will likely need both.
The source-span represents a slice of source. Source can be represented as UTF-8 code units, UTF-16 code units or a string. It sounds like it uses UTF-16 code units as the default representation. For that, the length should be in code units. That's the correct length in the source. That's what tells you which characters you need, in an easily accessible way.
Making the source use Characters is an unnecessary complication for that.
For printing, and showing offsets to users, you are more likely to want user-perceived characters. The offset into the line should probably be counted in grapheme clusters. So, a source span is just a range of code points. The position in the file, in user understandable line/position numbers, the position should be computed from using characters/glyphs.
For printing the ^, you don't need the length of the span at all, you need the size of the printed glyphs as rendered. That may be one character unit-size per grapheme cluster, but that assumes a fixed-width font, and even if the font is fixed-width, the emojis aren't necessarily the same size. The only really safe way to align the ^ on a new line is to print the same preceding characters again in invisible ink (but since ANSI conceal is not well supported, that's likely to just be black-on-black or white-on-white). If you have that kind of formatting available, maybe just use ANSI underline instead.
I'm not talking about the position of the ^ here (even though it is indeed also off, but fixing it in span.highlight would probably be complex due to the size of printed glyphs).
I'm talking about the column reported in the Sass stack trace, using span.start.column. That's the 12 in test.scss 2:12, which should be 11 according to what all editors I tried are using as position for that char.