sql icon indicating copy to clipboard operation
sql copied to clipboard

[Enhancement] Support Illegal Character in Regex Name Group

Open RyanL1997 opened this issue 2 months ago • 0 comments

Description

Currently, regex based extraction commands which are using java regex library has a limitation of including special characters such as (-, _ ,@) in the named captured group for creating a new column in the result site. Here are some related issues:

  • https://github.com/opensearch-project/sql/issues/3944
  • https://github.com/opensearch-project/sql/issues/4467

PR: https://github.com/opensearch-project/sql/pull/4434 enhanced the experience of unify the error handling of this for both parse and rex commands. Here is the current behavior

curl -X POST "localhost:9200/_plugins/_ppl" -H 'Content-Type: application/json' -d'{
    "query": "source=accounts | rex field=email \"(?<username>[^@]+)@(?<domain_name>[^.]+)\" | fields email, username, domain_name | head 3"
  }' | jq

{
  "error": {
    "reason": "Invalid Query",
    "details": "Invalid capture group name 'domain_name'. Java regex group names must start with a letter and contain only letters and digits.",
    "type": "IllegalArgumentException"
  },
  "status": 400
}

However, Coming from the https://github.com/opensearch-project/sql/pull/4434#issuecomment-3399182076 @ykmr1224 pointed we should be able to support the invalid characters by rewriting regex and map extracted values back to original name.

Expected Behavior

e.g.: (?<user_name>.+)(?<username>.+)(?<username1>.+) => (?<username2>.+)(?<username>(?<username1>.+), mapping = {username2 => user_name, username => username, username1 => username1}

Exit Criteria

  • Proper testing cover all the edge cases of re-writing - reference to https://github.com/opensearch-project/sql/pull/4434#issuecomment-3395550765
  • Double check the debugging flows (e.g. /_explain and server log) make sure this will not be lead into any confusions
  • Performance testing to make sure no notable performance downgrade
  • Update the documentations if the behavior changed (both parse and rex)

RyanL1997 avatar Oct 14 '25 16:10 RyanL1997