lhs2tex icon indicating copy to clipboard operation
lhs2tex copied to clipboard

Count grapheme clusters, not code points

Open nad opened this issue 11 years ago • 0 comments

The alignment feature of lhs2TeX --poly regards the string +̲ (containing + plus a combining character) as having length two, but it seems more reasonable to treat it as having length one, as it occupies a single column (if displayed "properly" using a monospace font). I suggest that lhs2TeX should count "grapheme clusters" rather than code points (http://www.unicode.org/reports/tr29/#Grapheme_Cluster_Boundaries).

If you use text-icu (Data.Text.ICU), then it seems as if you can implement a grapheme cluster counter in the following way, assuming that you want to use the current locale:

  length . breaks (breakCharacter Current)

nad avatar Mar 22 '13 21:03 nad