csvkit
csvkit copied to clipboard
UTF8 issue on Windows 10
Full error is:
'charmap' codec can't encode character '\u0107' in position 86: character maps to <undefined>
Also tested with test_utf8.csv file:
> csvcut -c baz .\test_utf8.csv > few.csv
'charmap' codec can't encode character '\u02a4' in position 0: character maps to <undefined>
> csvcut -e utf-8-sig -c baz .\test_utf8.csv > few.csv
'charmap' codec can't encode character '\u02a4' in position 0: character maps to <undefined>
> csvcut -e utf-8 -c baz .\test_utf8.csv > few.csv
'charmap' codec can't encode character '\u02a4' in position 0: character maps to <undefined>
> csvcut -e utf8 -c baz .\test_utf8.csv > few.csv
'charmap' codec can't encode character '\u02a4' in position 0: character maps to <undefined>
test_utf8 like my file, are missing BOM.
Therefore CMD or PowerShell interpret them as ascii.
I thought that opening the file with an hex editor and pasting  at the beginning, does the trick ... but no, still interpreted as ascii.
May be related to https://github.com/ipython/ipython/issues/10011
Closing as some Windows issues have been fixed in csvkit and in newer Python versions, and the Python and csvkit versions are not reported in the issue, so there is no reproducible example.