xpath icon indicating copy to clipboard operation
xpath copied to clipboard

Potential bug with getting parent node ( /.. )

Open dastorda opened this issue 3 years ago • 2 comments

Steps to reproduce: save the content of view-source:https://www.productfrom.com/product/416492-adidas-copa-mundial-soccer-shoes in test.html

package main

import (
	"fmt"

	"github.com/antchfx/htmlquery"
)

func main() {
	doc, err := htmlquery.LoadDoc("test.html")
	if err != nil {
		panic(err)
	}
	product := htmlquery.FindOne(doc, "//div[contains(@class, 'grid grid-cols-12 gap-0 border-t py-2 px-4')]//div[contains(.,'Product Name')]/..//div[contains(@class, 'text:right')]//span")
	if product != nil {
		fmt.Println("product:", htmlquery.InnerText(product))
	}
}

When I test the xpath expression online, e.g. here: https://htmlstrip.com/xpath-tester, then it finds a match using this expression: //div[contains(@class, 'grid grid-cols-12 gap-0 border-t py-2 px-4')]//div[contains(.,'Product Name')]/..//div[contains(@class, ' text:right')]//span.

dastorda avatar Nov 26 '20 14:11 dastorda

Your Go program will work if you change "text:right" in your XPath expression to "test-right":

$ curl -o test.html https://www.productfrom.com/product/416492-adidas-copa-mundial-soccer-shoes
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 37512    0 37512    0     0  68327      0 --:--:-- --:--:-- --:--:-- 68327
$ cat main.go 
package main

import (
	"fmt"

	"github.com/antchfx/htmlquery"
)

func main() {
	doc, err := htmlquery.LoadDoc("test.html")
	if err != nil {
		panic(err)
	}
	product := htmlquery.FindOne(doc, "//div[contains(@class, 'grid grid-cols-12 gap-0 border-t py-2 px-4')]//div[contains(.,'Product Name')]/..//div[contains(@class, 'text:right')]//span")
	if product != nil {
		fmt.Println("product:", htmlquery.InnerText(product))
	}
}
$ diff main.go main-typo-fixed.go 
14c14
< 	product := htmlquery.FindOne(doc, "//div[contains(@class, 'grid grid-cols-12 gap-0 border-t py-2 px-4')]//div[contains(.,'Product Name')]/..//div[contains(@class, 'text:right')]//span")
---
> 	product := htmlquery.FindOne(doc, "//div[contains(@class, 'grid grid-cols-12 gap-0 border-t py-2 px-4')]//div[contains(.,'Product Name')]/..//div[contains(@class, 'text-right')]//span")
$ go run main.go
$ go run main-typo-fixed.go 
product: Adidas COPA MUNDIAL soccer shoes

Not sure why your original XPath expression with the colon in the class name works for https://htmlstrip.com/xpath-tester, but it does not work in Chrome dev tools console. There again if you change "text:right" to "text-right" you will get the correct element:

image

da70 avatar Jan 07 '21 23:01 da70

Hello, I checked your give URL: view-source:https://www.productfrom.com/product/416492-adidas-copa-mundial-soccer-shoes, there is no any text:right characters in HTML source code ,only have text-right.

zhengchun avatar Jan 08 '21 11:01 zhengchun