html2markdown icon indicating copy to clipboard operation
html2markdown copied to clipboard

Fails to encode (replace) emojis

Open madduck opened this issue 1 year ago • 0 comments

Received this from an Apple mail client writing emojis in HTML mails. I know, I know… but the tool could handle this better.

% echo "��" | html2markdown - windows-1252
Traceback (most recent call last):              
  File "/usr/bin/html2markdown", line 33, in <module>
    sys.exit(load_entry_point('html2text==2020.1.16', 'console_scripts', 'html2text')())
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/html2text/cli.py", line 306, in main
    sys.stdout.write(h.handle(html))
UnicodeEncodeError: 'utf-8' codec can't encode characters in position 0-1: surrogates not allowed

For a start, it would help to know a bit more about where the error occurs, i.e. maybe print some context?

Or have something like --encode-errors=replace?

madduck avatar Sep 18 '23 21:09 madduck