lhs2tex
lhs2tex copied to clipboard
Count grapheme clusters, not code points
The alignment feature of lhs2TeX --poly regards the string +̲ (containing + plus a combining character) as having length two, but it seems more reasonable to treat it as having length one, as it occupies a single column (if displayed "properly" using a monospace font). I suggest that lhs2TeX should count "grapheme clusters" rather than code points (http://www.unicode.org/reports/tr29/#Grapheme_Cluster_Boundaries).
If you use text-icu (Data.Text.ICU), then it seems as if you can implement a grapheme cluster counter in the following way, assuming that you want to use the current locale:
length . breaks (breakCharacter Current)