draft-ietf-jsonpath-base icon indicating copy to clipboard operation
draft-ietf-jsonpath-base copied to clipboard

Add a new selector type: regex selector

Open He-Pin opened this issue 1 year ago • 14 comments

We have some fields in object/array , which is generated by backend, with name header_$id, I would like to select it with regex.

He-Pin avatar Apr 30 '24 02:04 He-Pin

Hey there @He-Pin. We actually have some support for that in the RFC. There are two regex functions, match() and search().

match() is implicitly anchored and will match on the full string.

search() is unanchored and will match on substrings.

Both use a flavor of regex called i-regexp, which was developed to be a compatible subset of most commonly used regex engines.

gregsdennis avatar Apr 30 '24 03:04 gregsdennis

I checked that but seem will not match our usage.

{
  "data": {
    "header_1": {
      "a": "1",
      "b": "2",
      "body": "{\"c\":\"3\"}"
    },
    "header_2": {
      "a": "1",
      "b": "2",
      "body": "{\"c\":\"3\"}"
    }
  }
}

background: we want to select some json fields for translation. tried java jsonpath implementation. as the json above, we want to select the fields header_1 and header_2 first, does that supported with the current rfc?

I was using $.data[?(@.keys() =~ /header_\d+/i)] but doesn't work. so now, I'm implementation one base on the RFC and with the extended grammar:

regexSelector: /string-literal/

then I can write $[data][/header_\d+/]

as you can see, the main point here we are select on object children's property name

He-Pin avatar Apr 30 '24 03:04 He-Pin

Oh, you want the property names to be matched, not the values.

That, I think is likely going to be covered by #516, which is the piece you're missing. Once you can access the property names, you should be able to pass them into the functions.

gregsdennis avatar Apr 30 '24 04:04 gregsdennis

I think we need the pointer to the child property name, maybe key() not keys().

He-Pin avatar Apr 30 '24 05:04 He-Pin

@gregsdennis as https://github.com/ietf-wg-jsonpath/draft-ietf-jsonpath-base/issues/109 , I have implemented this with a new selector RegExpSelector which works on ObjectNode's properties' name.

He-Pin avatar May 08 '24 05:05 He-Pin

@He-Pin it's great that you've been able to implement it, but be aware that because it's not a standard behavior, it's not interoperable.

We'll leave this open as an idea for a possible JSON Path v2, but there's no such discussion at the moment. Continuing to push this idea in the short term isn't going to make that happen any faster.

gregsdennis avatar May 08 '24 21:05 gregsdennis

Understand , as it's an internal needs, which should be fine.

He-Pin avatar May 09 '24 01:05 He-Pin

Another aspect of adding a regex selector is that there's no way to specify what kind of matching you want, which is why we have match() and search() functions rather than a simple ~= operator.

gregsdennis avatar May 10 '24 06:05 gregsdennis

Yes, as it's a valid name too. but the name selector is inside '$name' but the regex selector inside a /$regex/

He-Pin avatar May 10 '24 06:05 He-Pin

An update of this, we are currently using :

   * `/ $regexExp / $flags`
   * */
  private def regex[_: P]: P[Unit] = P("/" ~/ nonSlashOrEscapedSlash ~ ("/" ~ CharIn("idmsuUx").rep()))

  `regexp-selector` | `name-selector` | `wildcard-selector` | `slice-selector` | `index-selector` | `filter-selector`

I think one advantage of regexp-selector is it more lightweight than the search function, which will not require use to evaluate through the filter-expression-evaluator but still covers 80% of cases.

And there are real-world needs for this , refs: https://github.com/json-path/JsonPath/issues/949

He-Pin avatar Dec 17 '24 05:12 He-Pin

Edit: yes I see the difference. The regex needs to apply to the key, not the value.


~there are real-world needs for this~

~That issue is not indicative of a "need". The spec offers a solution. Yes, it's more verbose, but it also more explicitly expresses the intent of the path, which means it's more interoperable (the same path will evaluate consistently across implementations).~

gregsdennis avatar Dec 17 '24 06:12 gregsdennis

I think this is a possibility for a potential JSON Path 2.

gregsdennis avatar Dec 17 '24 06:12 gregsdennis

Yes, our current implementation is :

    private void evaluateRegExpSelector(final Node match,
                                        final Pattern pattern,
                                        final boolean isLastSegment,
                                        final boolean isDescendant,
                                        final Consumer<Node> resultNodeCollector) {
        final var node = match.currentNodeValue();
        if (node instanceof ObjectNode objectNode) {
            for (Map.Entry<String, JsonNode> member : objectNode.properties()) {
                final String key = member.getKey();
                if (pattern.matcher(key).matches()) {
                    final var value = member.getValue();
                    final var location = match.location().append(key);
                    final var newNode = newNode(objectNode, value, key, location, isLastSegment, isDescendant);
                    resultNodeCollector.accept(newNode);
                }
            }
            increaseComplexity(objectNode.size());
        }
    }

Where we test the regex with the children's property name, pattern.matcher(key).matches()

He-Pin avatar Dec 17 '24 06:12 He-Pin

As I had mentioned before, a choice will need to be made for match vs `search semantics. Or maybe a syntax that allows the user to specify which they want.

gregsdennis avatar Dec 17 '24 06:12 gregsdennis