html_to_plain_text
html_to_plain_text copied to clipboard
various improvements
Introduces the following changes:
-
ignores
noscripttags in html -
treats unicode
\u00a0and similar as whitespace when considering space characters for various operations -
ignores whitespace when avoiding duplicate urls in the output
-
adds a
:show_linksoption to suppress outputting links (default behavior remains unchanged)
I will add tests if these changes seem acceptable.
Those changes sound reasonable. Go ahead and add the tests.