goquery icon indicating copy to clipboard operation
goquery copied to clipboard

goquery could not parse <!--!-->

Open gonejack opened this issue 2 years ago • 1 comments

package main

import (
	"fmt"
	"log"
	"strings"

	"github.com/PuerkitoBio/goquery"
)

func main() {
	var html = `
<html>
<body>
<!--!-->
<div>text</div>
</body>
</html>
`
	doc, err := goquery.NewDocumentFromReader(strings.NewReader(html))
	if err != nil {
		log.Fatalln(err)
	}
	fmt.Println(doc.Find("div").Length())
}

output: 0

gonejack avatar Apr 05 '22 09:04 gonejack

Hello,

Thanks for raising this. It looks like you found an issue in Go's x/net/html package, here's a reproducible program that illustrates the issue with just the html parser:

func main() {
	var data = `
<html>
<body>
<!--!-->
<div>text</div>
</body>
</html>
`
	n, err := html.Parse(strings.NewReader(data))
	if err != nil {
		log.Fatalln(err)
	}
	html.Render(os.Stdout, n)
}

// Output:
// <html><head></head><body>
// <!--!-->
// <div>text</div>
// </body>
// </html>
// --></body></html>

As you can see, it "fixes" your html by expanding the comment all the way to the closing of the body. It doesn't see that it is closed immediately. Based on the whatwg spec, it looks like the comment you have should be valid: https://html.spec.whatwg.org/multipage/syntax.html#comments

So it looks like a net/html bug, and in fact it seems like an issue for this already exists: https://github.com/golang/go/issues/37771 . Feel free to add a comment there for your case.

I'll leave this issue open until there is resolution in x/net/html.

Martin

mna avatar Apr 05 '22 23:04 mna

The x/net issue is now fixed and merged (https://github.com/golang/go/issues/37771), now working as expected, closing this.

mna avatar Nov 17 '22 02:11 mna