miette icon indicating copy to clipboard operation
miette copied to clipboard

Label location and width do not account for character length

Open esoterra opened this issue 3 years ago • 6 comments

Label locations do not correctly account for the actual width of characters.

The specific case I tested was for which is treated as being 3-wide because it is 3 bytes in utf-8.

Here is a snippet of output for the issue.

LexerError { src: NamedSource { name: "bad.wrt", source: "<redacted>", span: SourceSpan { offset: SourceOffset(4), length: SourceOffset(3) } }

  × The input did not match a token rule
   ╭─[bad.wrt:1:1]
 1 │ abc ☃ abc
   ·     ─┬─
   ·      ╰── This text was not recognized
   ╰────

esoterra avatar Dec 16 '21 23:12 esoterra

This is a common issue and you need to comply UAX #29.

tasogare3710 avatar Dec 28 '21 07:12 tasogare3710

This should be possible to fix with the unicode-width crate.

Aloso avatar Jun 19 '22 22:06 Aloso

This issue requires an exact count of the user-perceived character(s)(a.k.a Grapheme Cluster), and UAX#11 has nothing to do with this issue.

See also http://www.unicode.org/reports/tr29/#Grapheme_Cluster_Boundaries

tasogare3710 avatar Jun 21 '22 15:06 tasogare3710

Yup, and we can probably use something like unicode_segmentation to get that. The other issue is that if terminals display graphemes with different widths (e.g. ö̲ vs a), then grapheme count isn't a perfect representation of horizontal offset.

esoterra avatar Jun 21 '22 15:06 esoterra

Is ö̲ a combining character sequence?

tasogare3710 avatar Jun 21 '22 16:06 tasogare3710

@tasogare3710 It does not require an exact count of the user-perceived characters, it requires an exact width of the user-perceived characters in a terminal. Since some characters, such as emojis and East Asian characters, are usually 2 columns wide instead of 1, label locations will still be incorrect if you just count grapheme clusters.

I suggested the unicode-width crate because that's apparently what helix uses.

Aloso avatar Jun 21 '22 16:06 Aloso