goquery
goquery copied to clipboard
Function to find the selector of a node
//node is a sub html*node of doc
// if ok return the select string such as `.sidebar-reviews article .content-block a`
sel, ok := doc.FindSelector(node)
Could this function be possible?
Hello,
It could be done, but there can be many valid selector strings for a given node, and there's no guarantee that this selector would be unique (that is, the selector could return many matches, not just the one for that specific node). I guess it could be made unique by adding :nth-child
pseudo classes everywhere, but not sure that would be super useful.
What do you want to achieve exactly?
Martin
Hello, I have thought about it, what I want to achieve is when I search a html dom tree, I find a node useful to me by some judge algorithms, I would store it's selector in database for future use
For example, I want to crawl thousand blogs newest article urls, the html dom tree is different to varying blogs, when add a blog index url, I want store all the selector of node <a href="{newest article url }">
in my db by some algorithms.
But I notice that
-
Goquery it a library to select the html node like
jquery
, so this feature may be a little different with goquery's goal. -
The selector is not unique for a given node (For now on, use recursive parent search to get the select of a node, to make the unique selector with
:nth-child
when parent has siblings ), It would be nice just likechrome
did
Thanks for the context, yeah I see what you mean, I think it makes sense. I'm gonna try to give this a shot, maybe this weekend (no promises :). I'll take a look at how Chrome handles this, but I think another option is to have an array of html.Node
indices to traverse the tree (instead of a css selector string). Maybe offer both options.
I implemented the PathForNode(*html.Node) []int
and NodeAtPath([]int) *html.Node
functions in the wip-selector
branch. That's not exactly what you wanted, but that's the same-ish feature. It works well, though it's not so nice to use because it works with *html.Node
instead of *goquery.Selection
(however it can still be useful as it is probably more efficient to match, retrieve and to store than the string selector version will be).
I'll try to add the selector string thing at some point which will fit better with the rest of the goquery API.
Thanks for the quick implement,I saw the commit, and the PathForNode
is a good path sign of a html.Node is a Dom tree. Though It's not exactly I want , I could use it to be a be a pointer which could be saved in database, so it's useful, I will keep waiting for your better implement~