data-wrangling
data-wrangling copied to clipboard
UnicodeDecodeError: 'gbk' codec can't decode byte when running parse_pdf_text.py
Hi, thank you for your wonderful book on data wrangling I encountered some issue when I was running the parse_pdf_text.py of chapter 5 in anaconda (python3.5) The IDE show me the followning error message
Traceback (most recent call last):
File "<ipython-input-10-957ab6bc6f5e>", line 39, in <module>
for line in openfile:
UnicodeDecodeError: 'gbk' codec can't decode byte 0x93 in position 46: illegal multibyte sequence
it looks like the code opened the file in text mode with a "gbk" encoding. It should probably be opened in binary mode? I'm not sure. How can I fix this problem? thank you.
Hi there,
Can you change this line near the top of the file:
openfile = open(pdf_txt, 'r')
to this:
openfile = open(pdf_txt, 'rb')
And let me know if that works better? Thanks!
-kjam