html_to_plain_text icon indicating copy to clipboard operation
html_to_plain_text copied to clipboard

various improvements

Open gsar opened this issue 8 years ago • 1 comments

Introduces the following changes:

  • ignores noscript tags in html

  • treats unicode \u00a0 and similar as whitespace when considering space characters for various operations

  • ignores whitespace when avoiding duplicate urls in the output

  • adds a :show_links option to suppress outputting links (default behavior remains unchanged)

I will add tests if these changes seem acceptable.

gsar avatar Aug 18 '17 20:08 gsar

Those changes sound reasonable. Go ahead and add the tests.

bdurand avatar Aug 19 '17 00:08 bdurand