html-agility-pack
html-agility-pack copied to clipboard
is not removed from the InnerText
1. Description
Here I'm trying to strip Html tags and attributes from a text and most of the tags are removed but
is staying in the text.
3. Fiddle or Project
https://dotnetfiddle.net/haBumr
public static string StripHtmlTags(this string input)
{
var doc = new HtmlDocument();
doc.LoadHtml(input ?? "");
return doc.DocumentNode.InnerText;
}
Input text:
<p>This is a test string. </p>
Output text:
This is a test string.
Is there any way I can get the text as I see in a browser?
- HAP version: 1.11.42
- NET version (.net core 2.2, .net core 3.1, etc.)
See the "Decode and strip HTML" example over here: https://html-agility-pack.net/online-examples
However, contrary to that the example code, i would strongly suggest to do the entity decoding after getting the inner text, and not before loading the HTML data into HtmlAgilityPack.
Great. I figured using the decode HTML earlier. But, I thought there might be a way where InnerText
will decode HTML if I provide some flag while loading HTML. Thank you for your help
LoadFromWebAsync how to decode?