csvkit icon indicating copy to clipboard operation
csvkit copied to clipboard

--help should document options to --encoding

Open mtfurlan opened this issue 4 years ago • 3 comments

I just ran into an issue with csvjoin 1.0.5 where a file was failing to parse because

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xc1 in position 377: invalid start byte

It didn't like Á.

In the end the solution was --encoding iso-8859-1, but it was really frustrating that the help said there was an encoding flag and nothing about how to use it.

mtfurlan avatar Apr 16 '21 23:04 mtfurlan

Right now the --help has:

-e ENCODING, --encoding ENCODING
                      Specify the encoding of the input CSV file.

What do you suggest it say?

jpmckinney avatar May 12 '21 14:05 jpmckinney

Either have a list of formats in the help there, or add a way to list formats.

  • ffmpeg has -formats
  • convert does -list formats

I think --encoding with no argument would be most reasonable, but also not as easy as just putting the list in the help.

mtfurlan avatar May 12 '21 15:05 mtfurlan

In the meantime, you can refer to the available encodings in Python here: https://docs.python.org/3/library/codecs.html#standard-encodings

jpmckinney avatar May 24 '21 16:05 jpmckinney

Python apparently has no easy way to report the available encodings on a given platform.

I instead just added a link to those docs from https://csvkit.readthedocs.io/en/latest/common_arguments.html

jpmckinney avatar Oct 17 '23 20:10 jpmckinney

Cool, thanks.

mtfurlan avatar Oct 17 '23 20:10 mtfurlan