html2markdown
html2markdown copied to clipboard
Fails to encode (replace) emojis
Received this from an Apple mail client writing emojis in HTML mails. I know, I know… but the tool could handle this better.
% echo "��" | html2markdown - windows-1252
Traceback (most recent call last):
File "/usr/bin/html2markdown", line 33, in <module>
sys.exit(load_entry_point('html2text==2020.1.16', 'console_scripts', 'html2text')())
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3/dist-packages/html2text/cli.py", line 306, in main
sys.stdout.write(h.handle(html))
UnicodeEncodeError: 'utf-8' codec can't encode characters in position 0-1: surrogates not allowed
For a start, it would help to know a bit more about where the error occurs, i.e. maybe print some context?
Or have something like --encode-errors=replace
?