spyql icon indicating copy to clipboard operation
spyql copied to clipboard

UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 2854: character maps to <undefined>

Open LavanyaDS opened this issue 2 years ago • 4 comments

i'm trying to read json file im getting below error

UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 2854: character maps to

LavanyaDS avatar Nov 22 '22 10:11 LavanyaDS

I'm using below query spyql "SELECT count_agg(*) AS n FROM json('file.json')"

UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 2854: character maps to

LavanyaDS avatar Nov 22 '22 10:11 LavanyaDS

Hi @LavanyaDS. This is a problem related to encodings, probably this (http://python-notes.curiousefficiency.org/en/latest/python3/text_file_processing.html#files-in-a-typical-platform-specific-encoding):

UnicodeDecodeError may be thrown when reading such files (if the data is not actually in the encoding returned by locale.getpreferredencoding())

One way to get the preferred encoding is:

$ spyql "IMPORT locale SELECT locale.getpreferredencoding()"
locale_getpreferredencoding
UTF-8

Likely, there is a mismatch between that file encoding and your system's preferred encoding. The solution on how spyql should support these cases is not completely obvious to me, maybe spyql should allow the specification of the encoding and/or allow choosing how to handle decoding errors.

One thing you can try is reading from the standard input:

spyql "SELECT count_agg(*) AS n FROM json" < file.json

Another things you can try:

  • changing the system's preferred encoding
  • re-encoding the file

Please let me know how it went and if you need further help (I will need to know what OS are you using).

dcmoura avatar Nov 22 '22 15:11 dcmoura

@LavanyaDS Were you able to solve the problem? We can prioritise support for different encodings and encoding errors handling.

dcmoura avatar Nov 25 '22 07:11 dcmoura

@LavanyaDS There is one other option: setting the env variable PYTHONIOENCODING. With it, you can set the encoding and the behavior on encoding errors.

http://docs.python.org/3/using/cmdline.html#envvar-PYTHONIOENCODING

dcmoura avatar Nov 25 '22 21:11 dcmoura