pykeepass Use elementpath for finding via XPath (2.0)

The problem is that lxml uses XPath 1.0. That version does not know how to escape those characters as those are special/forbidden. Instead, we can use elementpath which seems to be quite popular (among other XPath 2.0 related packages), but it breaks the tests.

For XPath 1.0 fixing #123 doesn't seem possible, so perhaps rewriting everything to elementpath.select(root, path) would work if the rest of the codebase was adjusted as well.

Closes #123

Oct 05 '20 19:10 KeyWeeUsr

Unfortunately the group lookup returns an empty list from the elementpath.select() and I have no clue why, so only for this single particular XPath I've changed it back to the tree.xpath() call and the rest works nicely with XPath 2.0. All tests passed, so the quotes + the rest works.

.......................................................................................
----------------------------------------------------------------------
Ran 87 tests in 12.947s

OK

Oct 08 '20 22:10 KeyWeeUsr

How about this function (taken from here):

def escape_string_for_xpath(s):
    if '"' in s and "'" in s:
        return 'concat(%s)' % ", '\"',".join('"%s"' % x for x in s.split('"'))
    elif '"' in s:
        return "'%s'" % s
    return '"%s"' % s

Then we'd modify the strings in xpath.py and remove the double quotes around the {} substitutions and let the function above figure out quoting.

e.g.

>>> kp = PyKeePass('test4.kdbx', 'password', 'test4.key')
>>> title_xpath = '/String/Key[text()="Title"]/../Value[text()={}]/../..'
>>> title = 'quote test -> " <-'
>>> kp._xpath('//Entry' + title_xpath.format(escape_string_for_xpath(title)), cast=True, first=True)
Entry: "quote test -> " <- (None)"

Oct 18 '20 06:10 Evidlo

Is there a reason we should prefer elementpath to the above solution? For the sake of people packaging pykeepass, I want to avoid adding any more dependencies unless its really necessary.

Nov 11 '20 16:11 Evidlo

Bump.

Dec 13 '20 00:12 Evidlo

One of the reason is that elementpath allows usage of XPath 1.0 as well as XPath 2.0 and by using the latter we're leveraging out-of-box support for quotes and other fixes, so overall it'd be better to use already newer model which will be even easier for switching to 3.x if/when required.

The other reason is that basically the escaping function is error prone and if there's any other character that XPath 1.0 doesn't like it'll break. Creating a list of unsupported characters and trying to escape them this way is rather ugly.

Re the dependencies reason:

we can't just be without deps nowadays, CPython itself has quite a lot of them already just for working and lxml is one of them
if we don't want to handle XML parsing ourselves, we need something/someone whose primary use-case is actually doing that
elementpath has no deps currently and is pure Python

Dec 14 '20 20:12 KeyWeeUsr

Big +1 for merging this

Dec 15 '20 07:12 pschmitt

The other reason is that basically the escaping function is error prone and if there's any other character that XPath 1.0 doesn't like it'll break. Creating a list of unsupported characters and trying to escape them this way is rather ugly.

If there are other characters besides " which are problematic, then I agree we shouldn't try to handle this ourselves. However, I'm not aware of any others characters that have this problem yet. Also it seems like this PR still needs special quote escaping.

As for all the new features of XPath 2.0, I don't see a need right now, as our queries in xpath.py are pretty simple.

In my opinion, we should keep this PR on the backburner until new issues arise, then take a second look.

Jan 16 '21 04:01 Evidlo