goquery
goquery copied to clipboard
html entity escap error
doc, _ := goquery.NewDocumentFromReader(strings.NewReader("<body><!--<p></p><!--[video]-->--></body>"))
fmt.Println(doc.Html())
// want <html><body><!--<p></p><!--[video]-->--></body></html> <nil>
// got <html><head></head><body><!--<p></p><!--[video]-->--></body></html> <nil>
Hello,
Good catch, looks like indeed the comments are not escaped when rendered (though I'm not 100% sure if it is a bug or working as intended per the html5 spec - i.e. once unescaped, it's impossible to distinguish that the <p></p>
part should stay unescaped, while the <!--[video]-->
part should be escaped), however this is a (potential) bug that should be reported to https://github.com/golang/go/issues?q=is%3Aissue+x%2Fnet%2Fhtml (for the x/net/html
package), as goquery does not parse/render html itself, it uses the golang.org/x/net/html
package for this (an html5 parser implemented by the Go team, although not part of the stdlib).
This program reproduces the issue with the net/html
package directly, without using goquery:
func main() {
root, err := html.Parse(strings.NewReader("<body><!--<p></p><!--[video]-->--></body>"))
if err != nil {
log.Fatal(err)
}
c := findCommentNode(root)
fmt.Printf("Comment Node Type: %d; Data: %q\n\n", c.Type, c.Data)
fmt.Println("html.Render:")
if err := html.Render(os.Stdout, c); err != nil {
log.Fatal(err)
}
fmt.Println()
}
func findCommentNode(n *html.Node) *html.Node {
if n.Type == html.CommentNode {
return n
}
for n := n.FirstChild; n != nil; n = n.NextSibling {
if nn := findCommentNode(n); nn != nil {
return nn
}
}
return nil
}
// Prints:
// Comment Node Type: 4; Data: "<p></p><!--[video]-->"
//
// html.Render:
// <!--<p></p><!--[video]-->-->
If you'd be so kind as to either link back to this issue from the net/html one, or just post the link to the issue in a comment here, I'd keep this open until there's a fix or decision made in the net/html
repo.
Thanks, Martin
This should be fixed in https://github.com/golang/net/commit/06994584191ebed30077b5176cefe09703557528. Can goquery
update it's pinned dependency on x/net
to >= v0.1.0?
I updated goquery's go.mod
file to use the latest x/net
dependency, but AFAIK it shouldn't matter - the go.mod
that matters is the one of the main
package, and its dependencies' go.mod
is ignore (that is, you can use the latest version of golang.org/x/net
even if goquery's go.mod is not updated).
Closing now that the x/net
issue is merged.