htmlquery
htmlquery copied to clipboard
Is it supposed to return the body node?
For some reason htmlquery.Find(parse, "/html/body//*")
returns the body
node too. I've tested that using https://codebeautify.org/Xpath-Tester as well as $x("/html/body//*")
in the browser console and it doesn't seem to include body
nodes. What am I missing?
First you get /html/body
node and then traveling all child of body node via FirstChild
and NextSibling
. https://godoc.org/golang.org/x/net/html#Node
Like the below code:
body, err := htmlquery.Findone(doc,"//body")
if err != nil {
// ...
}
for c := body.FirstChild; c != nil; c = c.NextSibling {
fmt.Println(c)
}
@zhengchun I honestly don't understand why you closed the issue. I wasn't looking for a workaround. The query specifically says /html/body//*
and requests all immediate and non-immediate children of body. So its returning of the body node seems weird, is it not?
Sorry, I thought it was solving the child node of body. I open it again.
I checked again and I understand your mean. I guess is /html/body//*
not a fullly valid expression in htmlquery
. You want query all child of body but exclude itself, you should change to /html/body/*
.
If you use //
characters in htmlquery that means query descendant-or-self::*
. https://github.com/antchfx/xpath
a//b : For each node matching a, add the descendant nodes matching b to the result.
In my opinion, like "/a//b" is not a valid query expression.
As it seems to be a valid syntax, that could be confirmed by $x(<expr>)
in the browser, may this be safely said that it is due to a bug? It's really hard to find a good XPath documentation with examples, apart of the spec which seems to say that both / and // are defined from the context of a previous portion.
That seems to mean that /html/body//*
locate all descendants of the /html/body
, not necessarily immediate children.
Solved. https://github.com/antchfx/xpath/commit/ed2f6ee6ff2a4066b9ea6b35db2868d4d2154cb5