KindleClippings
KindleClippings copied to clipboard
Is it able to decode other language such as Chinese
New to python but I think it's having issue decoding Chinese, need encoding="utf-8" maybe?:
Error:
return codecs.charmap_decode(input,self.errors,decoding_table)[0]
UnicodeDecodeError: 'charmap' codec can't decode byte 0x8d in position 121: character maps to
Did the error say which line was causing the problems? I don't think I've ever tried it on Chinese characters / Kanji etc.
Also, which python version are you using?
I am running on python 3.8 but had try 3.9 too. Thanks a bunch
Hi! Same problem trying to use your script, Python 3.7.6, and books in English and Spanish:
Traceback (most recent call last): File "KindleClippings.py", line 116, in <module> parse_clippings(source_file, destination) File "KindleClippings.py", line 57, in parse_clippings for highlight in f.read().split("=========="): File "d:\Miniconda3\lib\encodings\cp1252.py", line 23, in decode return codecs.charmap_decode(input,self.errors,decoding_table)[0] UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 1589: character maps to <undefined>
Thanks
Hi @aiturri,
Thanks for raising this. Would it be possible for you to share the part of the clipping file that is causing the errors? I'd love to try and fix this but can't reproduce the error.
Best, Robert
Sure! Thanks in advance!
Hi @aiturri,
I've just pushed a potential fix. Can you download the script again and try?
Otherwise you can manually modify line 55 to specify an encoding.
with open(source_file, "r", encoding="utf8") as f:
Let me know if it does or doesn't work. For the record, the original script worked fine on my machine with your clippings file so I couldn't verify the issue
Best, Robert
Hi @robertmartin8 , I tried again, and still not working:
Traceback (most recent call last):
File "KindleClippings.py", line 116, in
I will attach here my original clippings file so you can try, but I will delete as soon as you download it (please, let me know so I can delete (for privacy reasons!))
Thanks again!
@aiturri OK, I've downloaded it. Feel free to remove
@aiturri still can't reproduce it – I can parse your file accents and all. I think it's a mac/windows issue.
Can you try again? I forgot to add encoding="utf8"
to a couple of the file opens.
@robertmartin8
_Traceback (most recent call last):
File "KindleClippings.py", line 117, in
@aiturri Ok it seems this is related to a particular windows encoding. Other people seem to have had the same issue.
(Please save your clippings file beforehand just in case)
I've put two fixes: the first just ignores the errors – have a go and see whether it works (the output might be garbled).
The second is a new argument to specify the encoding:
python KindleClippings.py -encoding=cp1252
It might solve your problem?