CsQuery
CsQuery copied to clipboard
HTML Comment nodes are retrieved as part the .Text() method
CsQuery Version: 1.3.4 .Net Framework: 4.5
Test case (VB):
Dim div As CsQuery.CQ = New CsQuery.CQ("<div>This is not a comment<!-- , but this is a comment -->, nor is this a comment.</div>")
Dim html As String = div.Html()
Dim text As String = div.Text()
html
returns:
"This is not a comment<!-- , but this is a comment -->, nor is this a comment."
text
returns:
"This is not a comment , but this is a comment , nor is this a comment."
jQuery, by way of comparison, returns the text content without the comment content:
console.log($('<div>This is not a comment<!-- , but this is a comment -->, nor is this a comment.</div>').text());
"This is not a comment, nor is this a comment."
My workaround was to instantiate the CsQuery.CQ
object using the CsQuery.HtmlParsingOptions.IgnoreComments
parsing option.
Thank you for this much needed library.
This issue is still present. Should this be fixed?
My inclination would be to fix it as the purpose of this library seems to be to replicate the functionality of jQuery and this method has a different behavior in jQuery.
Comments are still being read by Text(). Sometimes an element will contain ie if statements that will incorrectly become the read text: <!--[if gte mso 9]>
I've made two extension methods to strip comments:
public static CQ StripComments(this CQ cq)
{
if (cq == null) return cq;
foreach (var element in cq)
{
element.StripComments();
}
return cq;
}
public static IDomObject StripComments(this IDomObject node)
{
if (node == null || node.ChildNodes == null) return node;
List<IDomObject> commentNodes = new List<IDomObject>();
foreach (var childNode in node.ChildNodes)
{
if (childNode.NodeType == NodeType.COMMENT_NODE)
{
commentNodes.Add(childNode);
}
if (childNode.ChildNodes != null && childNode.ChildNodes.Count > 0)
{
childNode.StripComments();
}
}
foreach (var commentNode in commentNodes)
{
node.ChildNodes.Remove(commentNode);
}
return node;
}