miller icon indicating copy to clipboard operation
miller copied to clipboard

Combining characters cause wrong alignment

Open agguser opened this issue 3 years ago • 5 comments

PPRINT output does not align correctly when there are combining characters, e.g.

$ echo "name,age\nกา,24\nก่า,25" | mlr --c2p cat
name age
กา   24
ก่า  25

(Note: ก่า = ก + ่ + า)

agguser avatar Oct 24 '20 18:10 agguser

looking ...

johnkerl avatar Oct 24 '20 22:10 johnkerl

OK, I'll need to research this. Same results between C and Go, and the latter is quite good with the UTF-8 handling so I'm surprised.

johnkerl avatar Oct 26 '20 03:10 johnkerl

P.S. It's even easier to see using '$len = strlen($name)' and noting that the lengths are 2 and 3, respectively.

johnkerl avatar Oct 26 '20 03:10 johnkerl

Workaround:

$ echo "name,age\nกา,24\nก่า,25" | mlr --c2t cat | column -t -s $'\t'
name  age
กา    24
ก่า    25

agguser avatar Oct 28 '20 02:10 agguser

Here is another straightforward use case

$ echo 'café,1234'  | mlr --c2t --implicit-csv-header cat
1       2
caf�    1234

but it works using just CSV input

$ echo 'café,1234'  | mlr --icsv --implicit-csv-header cat
1=café,2=1234

@agguser the workaround doesn't seem to work with UTF-8 output

echo 'café,1234'  | mlr --c2t --implicit-csv-header cat | column -t -s $'\t'
1        2
caf\xc3  1234

adren avatar Jun 06 '22 21:06 adren