Overpass-API
Overpass-API copied to clipboard
Could you support PCRE regular expression ?
Hi,
I would use "advanced" regex like words boundaries or lookahead
The "Perl Compatible Regular Expressions" seem easy to use. See : http://www.regular-expressions.info/pcre.html
Best regards,
Yves
Does it work ? Or how it is possible to test it ?
Could you make a pull request ?
Thanks
Well, in the prototype, all regex are handled by pcre now (there's no way to switch between pcre and posix regex yet). It sort of works on my local machine, but you'd need to set up your own instance for testing as of today.
The big question however is, if Roland (@drolbr) wants to introduce an additional dependency to pcre. Right now, there are only very few dependencies to other libs.
there's no way to switch between pcre and posix regex yet
Is it an issue ? I think — but I could be wrong — that you could do the same search and more with PCRE that POSIX regex ?
@drolbr What is your position about prce ? :smile:
@pyrog : In the meantime, you could do a few tests with PCRE enabled on the test instance: http://overpass-turbo.eu/s/b1e
Here's another example which will return ways with a single building=* tag only: http://overpass-turbo.eu/s/b0B
Disclaimer: there's no guarantee that this will ever make it into the official branch and the link will be discontinued after some time.
PCRE has shown some performance regressions with certain UTF-8 characters during performance testing, see http://wiki.openstreetmap.org/wiki/User:Mmd/Overpass_API/Performance_Project_2016.
Example:
node["name"~"[قق][اا][لل]"]
I would recommend to leave POSIX as default, and enable PCRE only via some explicit query setting maybe.
Issue should be closed, follow up is in #332
Hi again,
I want to use positive or negative lookahead.
For example, I want to find wrong values of wikimedia pictures (not started with File:
or Category:
)
wikimedia_commons~/^(?!(Category|File):).*/i
result: static error: Invalid regular expression: "^(?!(Category|File):).*"
I could use wikimedia_commons~/http/
but I loose values like 1524488623511.jpg
I would recommend to leave POSIX as default, and enable PCRE only via some explicit query setting maybe.
@mmd-osm Is UTF-8 handling is still slow in PCRE? If no, could you please replace POSIX Extended with it? If yes, could you please add a query setting for PCRE?
Lookaheads and lookbehinds would be really useful to filter multiple tag values separated with semicolons, for example.