spyql
spyql copied to clipboard
UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 2854: character maps to <undefined>
i'm trying to read json file im getting below error
UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 2854: character maps to
I'm using below query spyql "SELECT count_agg(*) AS n FROM json('file.json')"
UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 2854: character maps to
Hi @LavanyaDS. This is a problem related to encodings, probably this (http://python-notes.curiousefficiency.org/en/latest/python3/text_file_processing.html#files-in-a-typical-platform-specific-encoding):
UnicodeDecodeError may be thrown when reading such files (if the data is not actually in the encoding returned by locale.getpreferredencoding())
One way to get the preferred encoding is:
$ spyql "IMPORT locale SELECT locale.getpreferredencoding()"
locale_getpreferredencoding
UTF-8
Likely, there is a mismatch between that file encoding and your system's preferred encoding. The solution on how spyql should support these cases is not completely obvious to me, maybe spyql should allow the specification of the encoding and/or allow choosing how to handle decoding errors.
One thing you can try is reading from the standard input:
spyql "SELECT count_agg(*) AS n FROM json" < file.json
Another things you can try:
- changing the system's preferred encoding
- re-encoding the file
Please let me know how it went and if you need further help (I will need to know what OS are you using).
@LavanyaDS Were you able to solve the problem? We can prioritise support for different encodings and encoding errors handling.
@LavanyaDS There is one other option: setting the env variable PYTHONIOENCODING. With it, you can set the encoding and the behavior on encoding errors.
http://docs.python.org/3/using/cmdline.html#envvar-PYTHONIOENCODING