html-strip-tags-go
html-strip-tags-go copied to clipboard
escapes characters in result
For some HTML content, I have escapes characters in result text content.Test Test
originalTest := "<h1>J’aime la visualisation simple et explicite du maillage.</h1><p>Très intéressant. C’est l’application au e-commerce qui m’intéresse maintenant !</p>"
strippedTest := strip.StripTags(originalTest) // => "J’aime la visualisation simple et explicite du maillage.Très intéressant. C’est l’application au e-commerce qui m’intéresse maintenant !"
println(strippedTest)
Must I use html.UnescapeString() before? Or does exists some function/parameter that can do this in html-strip-tags-go package?
Same here, I started to use the idiom html.UnescapeString(strip.StripTags(...)). See https://github.com/openshift-online/ocm-sdk-go/pull/274 for an example.
I think this is worth documenting in the README.
I believe the correct order is strip tags before interpreting HTML entities. Consider a page talking about html tags:
To give your site a 90s look, use the <blink> tag, like this:
<code> <blink> Under construction </blink> </code>.
-
html.UnescapeString(strip.StripTags(...))gives:To give your site a 90s look, use the <blink> tag, like this: <blink> Under construction </blink> .which matches what a browser user would actually see.
-
strip.StripTags(html.UnescapeString(...))OTOH will lose the distiction between actual<vs escaped<etc, and then strip the tag-like text too:To give your site a 90s look, use the tag, like this: Under construction .