goquery
goquery copied to clipboard
goquery could not parse <!--!-->
package main
import (
"fmt"
"log"
"strings"
"github.com/PuerkitoBio/goquery"
)
func main() {
var html = `
<html>
<body>
<!--!-->
<div>text</div>
</body>
</html>
`
doc, err := goquery.NewDocumentFromReader(strings.NewReader(html))
if err != nil {
log.Fatalln(err)
}
fmt.Println(doc.Find("div").Length())
}
output: 0
Hello,
Thanks for raising this. It looks like you found an issue in Go's x/net/html
package, here's a reproducible program that illustrates the issue with just the html parser:
func main() {
var data = `
<html>
<body>
<!--!-->
<div>text</div>
</body>
</html>
`
n, err := html.Parse(strings.NewReader(data))
if err != nil {
log.Fatalln(err)
}
html.Render(os.Stdout, n)
}
// Output:
// <html><head></head><body>
// <!--!-->
// <div>text</div>
// </body>
// </html>
// --></body></html>
As you can see, it "fixes" your html by expanding the comment all the way to the closing of the body. It doesn't see that it is closed immediately. Based on the whatwg spec, it looks like the comment you have should be valid: https://html.spec.whatwg.org/multipage/syntax.html#comments
So it looks like a net/html
bug, and in fact it seems like an issue for this already exists: https://github.com/golang/go/issues/37771 . Feel free to add a comment there for your case.
I'll leave this issue open until there is resolution in x/net/html
.
Martin
The x/net
issue is now fixed and merged (https://github.com/golang/go/issues/37771), now working as expected, closing this.