Overpass-API icon indicating copy to clipboard operation
Overpass-API copied to clipboard

Could you support PCRE regular expression ?

Open pyrog opened this issue 9 years ago • 8 comments

Hi,

I would use "advanced" regex like words boundaries or lookahead

regex word boundary

The "Perl Compatible Regular Expressions" seem easy to use. See : http://www.regular-expressions.info/pcre.html

Best regards,

Yves

pyrog avatar Nov 01 '14 09:11 pyrog

Does it work ? Or how it is possible to test it ?

Could you make a pull request ?

Thanks

pyrog avatar Dec 01 '14 17:12 pyrog

Well, in the prototype, all regex are handled by pcre now (there's no way to switch between pcre and posix regex yet). It sort of works on my local machine, but you'd need to set up your own instance for testing as of today.

The big question however is, if Roland (@drolbr) wants to introduce an additional dependency to pcre. Right now, there are only very few dependencies to other libs.

mmd-osm avatar Dec 14 '14 10:12 mmd-osm

there's no way to switch between pcre and posix regex yet

Is it an issue ? I think — but I could be wrong — that you could do the same search and more with PCRE that POSIX regex ?

@drolbr What is your position about prce ? :smile:

pyrog avatar Dec 14 '14 10:12 pyrog

@pyrog : In the meantime, you could do a few tests with PCRE enabled on the test instance: http://overpass-turbo.eu/s/b1e

Here's another example which will return ways with a single building=* tag only: http://overpass-turbo.eu/s/b0B

Disclaimer: there's no guarantee that this will ever make it into the official branch and the link will be discontinued after some time.

mmd-osm avatar Mar 10 '15 20:03 mmd-osm

PCRE has shown some performance regressions with certain UTF-8 characters during performance testing, see http://wiki.openstreetmap.org/wiki/User:Mmd/Overpass_API/Performance_Project_2016.

Example:

node["name"~"[قق][اا][لل]"]

I would recommend to leave POSIX as default, and enable PCRE only via some explicit query setting maybe.

mmd-osm avatar Jun 04 '16 09:06 mmd-osm

Issue should be closed, follow up is in #332

mmd-osm avatar Jul 22 '17 10:07 mmd-osm

Hi again,

I want to use positive or negative lookahead.

For example, I want to find wrong values of wikimedia pictures (not started with File: or Category:)

wikimedia_commons~/^(?!(Category|File):).*/i

result: static error: Invalid regular expression: "^(?!(Category|File):).*"

I could use wikimedia_commons~/http/ but I loose values like 1524488623511.jpg

pyrog avatar Nov 12 '19 14:11 pyrog

I would recommend to leave POSIX as default, and enable PCRE only via some explicit query setting maybe.

@mmd-osm Is UTF-8 handling is still slow in PCRE? If no, could you please replace POSIX Extended with it? If yes, could you please add a query setting for PCRE?

Lookaheads and lookbehinds would be really useful to filter multiple tag values separated with semicolons, for example.

gy-mate avatar Mar 04 '24 14:03 gy-mate