html2csv icon indicating copy to clipboard operation
html2csv copied to clipboard

Let beautifulsoup guess the input codec

Open JeffCarpenter opened this issue 4 years ago • 1 comments

Unlike pathlib, BeautifulSoup can guess and handle several text codecs so we let it work its magic

Addresses issue #5

JeffCarpenter avatar Nov 24 '21 01:11 JeffCarpenter

Any change to get this merged? This PR solved my problem reading a non utf-8 input

Traceback (most recent call last):
  File "/opt/homebrew/bin/html2csv", line 8, in <module>
    sys.exit(main())
             ^^^^^^
  File "/opt/homebrew/lib/python3.11/site-packages/html2csv/__main__.py", line 41, in main
    html_doc = path.read_text()
               ^^^^^^^^^^^^^^^^
  File "/opt/homebrew/Cellar/[email protected]/3.11.3/Frameworks/Python.framework/Versions/3.11/lib/python3.11/pathlib.py", line 1059, in read_text
    return f.read()
           ^^^^^^^^
  File "<frozen codecs>", line 322, in decode
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xf3 in position 532: invalid continuation byte

fernandomora avatar Apr 21 '23 01:04 fernandomora