etree icon indicating copy to clipboard operation
etree copied to clipboard

new regexp matching operator

Open snar opened this issue 6 years ago • 2 comments

Sometimes I need to find elements based not on full, but rather on partial matches (for example, find all elements with contents starting with some string). Unfortunately it was not possible with etree implementation, so I implemented regexp-matching operator, ~. Example use: FindElements("//name[text()~'^ae.*'").

Please note that implementation is somewhat ugly: as etree library uses simple strings.Split("[") to separate path segments, it's not (yet?) possible to use regexes containing brackets, for example, an attempt to FindElements("//name[text()~'^ae[0-9]']") will lead to 'bad brackets' error.

snar avatar Mar 21 '19 13:03 snar

Sorry it's taken me so long to comment on this pull request.

I like the idea behind this change, but I wonder if there's a way to make the syntax more similar to the way regular expressions are used in other contexts. For instance, if the slash characters could be used, it might be more obvious to someone reading a path that it is intending to use a regexp.

What do you think about something like this:

"./bookstore/book[author=/Kurt.*/]/title"
"//book[p:price=/29.*/]/title"
"//book/price[text()=/29.*/]"

Your concern about the current parser's limitations w.r.t. things like bracket characters, however, is a valid one. I have another pull request I've been holding off on merging that might make this a moot point, however, as it has a more sophisticated parser. I'm still deciding whether to merge that change.

beevik avatar Aug 28 '19 00:08 beevik

I was looking for this exact functionality, and was about to do this when I found this PR. Any chance it will get pulled?

rkoshy avatar Aug 07 '20 02:08 rkoshy

Closing due to lack of response.

beevik avatar May 07 '23 20:05 beevik