cssselect icon indicating copy to clipboard operation
cssselect copied to clipboard

Incorrect use of XPath name() function

Open kovidgoyal opened this issue 13 years ago • 1 comments

The use of the name() function for matching tags breaks with documents that have a default namespace or multiple namespace prefixes mapping to the same namespace.

For example,

The CSS selector

h|p + h|p

becomes

descendant-or-self::h:p/following-sibling::*[name() = 'h:p' and (position() = 1)]

When this query is run on a XHTML document it will produce no matches, because the name() function returns "p". Similarly if it is run on a document that defines the XHTML namespace with a prefix other than h it will fail.

A possible solution is to have the css_to_xpath function take a namespaces argument that contains a mapping of prefixes to URIs and then use local-name() and namespace-uri() instead of name(). The argument can default to None, in which case it can use the present behavior, for backward compatibility.

See http://lenzconsulting.com/namespaces-in-xslt/#perils_of_the_name_function for more details on the problems caused by using the name() function.

kovidgoyal avatar Oct 20 '12 03:10 kovidgoyal

Hi,

Sorry for the delay to respond. I just confirmed that the name() function in lxml uses the prefix from the document source rather than the namespace mapping of the XPath expression.

So there is a bug, but namespace handling in cssselect is generally broken. See #9. It needs a rewrite and I know how to do it but it’s just low priority for me right now. Until I get to it, anyone willing to give it a go is welcome to do so.

SimonSapin avatar Nov 06 '12 14:11 SimonSapin