mapbox-gl-js
mapbox-gl-js copied to clipboard
Add expression operator testing a string against a regular expression
From @ansis on January 15, 2015 20:32
We've talked about this before but it was never implemented. How would it be specified?
Carto uses =~. We could also use regex.
Copied from original issue: mapbox/mapbox-gl-style-spec#233
From @divya1c on March 15, 2016 21:26
Hi! is the regex feature added to mapbox gl yet?
From @tmcw on March 15, 2016 21:33
If the issue is open, the task isn't done yet.
From @1ec5 on July 10, 2016 22:53
regex (or like) would be more discoverable/memorable, since ~ has wildly different meanings in every language.
From @tmcw on November 22, 2016 21:6
To unpack what's necessary to implement this feature:
- Naming -
regexor~ - Compatibility across GL JS and GL Native
1 will likely be 10% of the work or less. Problem number 2 is more complex.
If the JavaScript port (GL JS) uses JavaScript's built-in RegExp object, then GL Native will need to include a compatible implementation of regular expressions in order to ensure that maps render exactly the same in a native environment. There are many particular flavors of regular expressions, so picking a feature-filled Native regular expression engine would mean that styles break in less feature-filled GL JS RegExp implementations.
There's also the question of whether GL Native should use platform-provided regular expression libraries, or bring its own on the C++ level. From diving into my handy V8 source checkout (get your own today! they're great), V8's implementation is Irregexp).
From @1ec5 on November 22, 2016 21:59
@jfirebaugh points out that we can use std::regex’s ECMAScript regex support in gl-native to ensure compatibility between GL JS and the native SDKs. However, note that std::regex supports some features that browsers don’t, like [:alnum:].
In any case, I would be very much prefer that we use the platform-provided regex facilities in gl-native, similar to how we use the platform-provided facilities for uppercasing and lowercasing strings. (This does lead to minor discrepancies among the platforms: https://github.com/mapbox/mapbox-gl/issues/21#issuecomment-234724866.) Using the platform-provided regex facilities means we don’t incur an increase in the SDK’s size, and it ensures that the runtime styling API is compatible with any other regex the developer uses in their application.
To illustrate my point, style specification filters are represented on iOS and macOS as NSPredicate objects. This is as natural as representing strings as NSString objects. NSPredicate format strings accept a SQL-like syntax, where the MATCHES operator is documented to support ICU regex syntax. However, if core code only supports ECMAScript syntax, then the iOS and macOS SDKs need to transform ICU regex to EMCAScript regex, rejecting any regex that doesn’t translate, and transform in the other direction as well when getting the predicate of a style layer. Otherwise, without this SDK-level transformation, the SDK’s behavior would be perceived as a bug.
I recognize that bringing ICU regex to GL JS would be a challenging task, and that some Studio users could expect ECMAScript regex since the live preview is implemented using GL JS. Fortunately, there’s enough overlap between the two syntaxes that I think we should declare a common subset of ICU and ECMAScript to be the syntax we want to support for filters; anything else (like lookbehind or Unicode properties in ICU, or matching Turkish İ with [A-Z] in ECMAScript) comes with a caveat that it isn’t guaranteed to work across platforms. I think most Studio users would come to Studio without knowing that there are different regex syntaxes, and they’re just as likely to try an ICU or PCRE regex as they are to try an ECMAScript regex.
From @1ec5 on November 22, 2016 22:27
In chat, @tmcw brought up a valid concern that a user might input a regex for a filter that appears to do the right thing in GL JS, but it happens to prevent the layer from showing up at all on iOS. There’s no substitute for testing, but I agree that we should aim to make the live preview in Studio as faithful to the rendered output as possible.
The thing is, I think the subset of features in ICU but not ECMAScript is pretty small, and the subset in ICU but not ECMAScript even smaller. If Studio could detect the use of these features and display a warning icon, that would essentially enforce the subset of regex that we do officially support.
Any progress on regex filters?
Is there currently any alternative to regex filters? Maybe not strong pattern matching, but perhaps where a string can be matched to see if it exists in any part of a property value?
Is there currently any alternative to regex filters? Maybe not strong pattern matching, but perhaps where a string can be matched to see if it exists in any part of a property value?
If they regex's are static (and not determined at runtime) you can preprocess your data with the regex into a new attribute.
In case it helps anyone else who is stuck on this (and only needs to use querySourceFeatures in JS), I've hacked this together https://github.com/maphubs/mapbox-gl-regex-query
Could this be done with custom pluggable filters? All my hack really does is add a custom operator to the filter compile method and give it a custom comparator function. That might offer the best of both worlds? Then the platforms just ignore any operators they don't know, kind of like browser-specific CSS rules. It would also still allow a more limited SQL-like syntax for a simpler cross-platform option, for Studio users etc.
filter: ['like', '%name%']
or for advanced users that want to use regex
filter: [
'all',
['~js', '/.*name.*/g'],
['~ios', '...']
]
@kriscarle can yor hack be used without npm or yarn and how? It looks very promising!
@politvs lets move that discussion here https://github.com/maphubs/mapbox-gl-regex-query/issues/1 so we don't spam the Mapbox team :)
Now that expressions have landed and can be used as filters, this is actually a request to add – wait for it – a regular expression expression operator. Specifically, there should be an operator that tests whether a string matches a regular expression.
My two cents (and a pull request get the discussion going) is that by returning the match groups as an array would instead of just a boolean value, would provide a much more flexible base to build on.
Returning match groups would allow for:
- The basic checking if the expression matched or not (by checking if it returned null)
- Search and replace (by capturing the bits that should not be replaced, and combining them and the replacement string with "concat") #4100
- Extracting portions of a property value (this is my personal reason for wanting this, I have data I can't easily modify, and need to get some pieces of text extracted from property values to be shown).
I think the biggest open issue we need to resolve here is the one about cross-platform compatibility. Given @1ec5's point in https://github.com/mapbox/mapbox-gl-js/issues/4089#issuecomment-276799020 that ICU and ECMAScript regexp syntaxes mostly overlap, would it be feasible to just only allow the common subset of both?
Looking at the latest ECMA script regex spec and the specification of ICU regex, it looks like ICU is a superset of ECMA script syntax.
I've checked all basic operations as well as the syntax for things like non capturing groups, lookaheads etc and all the ones in ECMA are also in ICU. However, I haven't been able to wrap my head around the unicode specific bits so I can't really comment on those.
Flag handling does seem to differ more though:
- No "g" flag in ICU but implementing it manually shouldn't be too difficult (actually a bit unclear what the default behavior is for matching multiple times in ICU)
- "u" flag to causes ECMA to treat pattern as unicode, I couldn't find reference in ICU, but I expect this is the default.
- "y" flag is present in ECMA but not ICU but also irrelevant in this implementation.
Other ECMA flags are present in ICU.
Looks like the u flag also enables a unicode escape syntax not in ICU \u{1D306}
@lucaswoj Any news on the implementation of this? There is an open PR (#6228) that has been around for a few months. This would be immensely useful.
@stdmn Unfortunately I'm not a good person to ask about this. I haven't worked on the GL core team for some time. It looks like that PR is still stuck on some design decisions. I'm sure folks would be interested in continuing the discussion if you took ownership of the PR.
Any update this one? Would be immensely useful to me
I closed https://github.com/mapbox/mapbox-gl-js/pull/6228 because there are still many open questions:
- Styles (and thus expressions) are executed on multiple platforms, and JavaScript is just one of them. Regex engines on those platforms support a widely different spectrum of features, so it's easy to create regular expressions that work on one platform but don't work on another platform. We generally expect styles to work on all platform, and platform-specific regular expressions would counteract this expectation.
- JavaScript regular expressions are heavily flawed when processing text, which is what I imagine as the main application for a regex expression. @1ec5 explains more in https://github.com/mapbox/mapbox-gl-js/pull/6228#issuecomment-369054652
If regex isn't possible it would be very helpful to at least have startsWith and endsWith.