html-strip-tags-go icon indicating copy to clipboard operation
html-strip-tags-go copied to clipboard

escapes characters in result

Open LeMoussel opened this issue 8 years ago • 1 comments

For some HTML content, I have escapes characters in result text content.Test Test

	originalTest := "<h1>J&rsquo;aime la visualisation simple et explicite du maillage.</h1><p>Très intéressant. C&rsquo;est l&rsquo;application au e-commerce qui m’intéresse maintenant !</p>"
	strippedTest := strip.StripTags(originalTest) // => "J&rsquo;aime la visualisation simple et explicite du maillage.Très intéressant. C&rsquo;est l&rsquo;application au e-commerce qui m’intéresse maintenant !"
	println(strippedTest)

Must I use html.UnescapeString() before? Or does exists some function/parameter that can do this in html-strip-tags-go package?

LeMoussel avatar Jun 07 '17 12:06 LeMoussel

Same here, I started to use the idiom html.UnescapeString(strip.StripTags(...)). See https://github.com/openshift-online/ocm-sdk-go/pull/274 for an example. I think this is worth documenting in the README.

I believe the correct order is strip tags before interpreting HTML entities. Consider a page talking about html tags:

To give your site a 90s look, use the &lt;blink&gt; tag, like this: 
<code> &lt;blink&gt; Under construction &lt;/blink&gt; </code>.
  • html.UnescapeString(strip.StripTags(...)) gives:

    To give your site a 90s look, use the <blink> tag, like this: <blink> Under construction </blink> .
    

    which matches what a browser user would actually see.

  • strip.StripTags(html.UnescapeString(...)) OTOH will lose the distiction between actual < vs escaped &lt; etc, and then strip the tag-like text too:

    To give your site a 90s look, use the tag, like this: Under construction .
    

cben avatar Oct 11 '20 19:10 cben