miller
miller copied to clipboard
Combining characters cause wrong alignment
PPRINT output does not align correctly when there are combining characters, e.g.
$ echo "name,age\nกา,24\nก่า,25" | mlr --c2p cat
name age
กา 24
ก่า 25
(Note: ก่า = ก + ่ + า)
looking ...
OK, I'll need to research this. Same results between C and Go, and the latter is quite good with the UTF-8 handling so I'm surprised.
P.S. It's even easier to see using '$len = strlen($name)'
and noting that the lengths are 2 and 3, respectively.
Workaround:
$ echo "name,age\nกา,24\nก่า,25" | mlr --c2t cat | column -t -s $'\t'
name age
กา 24
ก่า 25
Here is another straightforward use case
$ echo 'café,1234' | mlr --c2t --implicit-csv-header cat
1 2
caf� 1234
but it works using just CSV input
$ echo 'café,1234' | mlr --icsv --implicit-csv-header cat
1=café,2=1234
@agguser the workaround doesn't seem to work with UTF-8 output
echo 'café,1234' | mlr --c2t --implicit-csv-header cat | column -t -s $'\t'
1 2
caf\xc3 1234