CsQuery icon indicating copy to clipboard operation
CsQuery copied to clipboard

HTML Comment nodes are retrieved as part the .Text() method

Open pete-ppc opened this issue 9 years ago • 4 comments

CsQuery Version: 1.3.4 .Net Framework: 4.5

Test case (VB):

Dim div As CsQuery.CQ = New CsQuery.CQ("<div>This is not a comment<!-- , but this is a comment -->, nor is this a comment.</div>")
Dim html As String = div.Html()
Dim text As String = div.Text()

html returns:

"This is not a comment<!-- , but this is a comment -->, nor is this a comment."

text returns:

"This is not a comment , but this is a comment , nor is this a comment."

jQuery, by way of comparison, returns the text content without the comment content:

console.log($('<div>This is not a comment<!-- , but this is a comment -->, nor is this a comment.</div>').text());
"This is not a comment, nor is this a comment."

My workaround was to instantiate the CsQuery.CQ object using the CsQuery.HtmlParsingOptions.IgnoreComments parsing option.

Thank you for this much needed library.

pete-ppc avatar Oct 27 '14 13:10 pete-ppc

This issue is still present. Should this be fixed?

marcselman avatar Jun 22 '15 15:06 marcselman

My inclination would be to fix it as the purpose of this library seems to be to replicate the functionality of jQuery and this method has a different behavior in jQuery.

pete-ppc avatar Jun 22 '15 15:06 pete-ppc

Comments are still being read by Text(). Sometimes an element will contain ie if statements that will incorrectly become the read text: <!--[if gte mso 9]>...

tariqporter avatar Jul 16 '15 23:07 tariqporter

I've made two extension methods to strip comments:

public static CQ StripComments(this CQ cq)
{
    if (cq == null) return cq;

    foreach (var element in cq)
    {
        element.StripComments();
    }

    return cq;
}

public static IDomObject StripComments(this IDomObject node)
{
    if (node == null || node.ChildNodes == null) return node;

    List<IDomObject> commentNodes = new List<IDomObject>();
    foreach (var childNode in node.ChildNodes)
    {
        if (childNode.NodeType == NodeType.COMMENT_NODE)
        {
            commentNodes.Add(childNode);
        }

        if (childNode.ChildNodes != null && childNode.ChildNodes.Count > 0)
        {
            childNode.StripComments();
        }
    }
    foreach (var commentNode in commentNodes)
    {
        node.ChildNodes.Remove(commentNode);
    }

    return node;
}

marcselman avatar Jul 17 '15 07:07 marcselman