exist icon indicating copy to clipboard operation
exist copied to clipboard

[BUG] Regex character groups can't end with a hyphen in `fn:matches()`

Open amclark42 opened this issue 2 years ago • 2 comments

eXist returns an error, FORX0002, when the regular expression given to fn:matches() includes a character group ending in the hyphen character. The error message reads:

org.exist.xquery.XPathException: err:FORX0002 Conversion from XPath F&O 3.0 regular expression syntax to Java regular expression syntax failed [...] hyphen in a character range must be followed by a single character

In XPath, a hyphen is allowed at the beginning or end of a positive character group.

To Reproduce

xquery version "3.1";

module namespace t="http://exist-db.org/xquery/test";
declare namespace test="http://exist-db.org/xquery/xqsuite";

declare variable $t:post-hyphen-group := "[l-]";
declare variable $t:pre-hyphen-group := "[-l]";
declare variable $t:sample-string := "aww he--";

(: Match the sample string against a regex character group that ends with a 
  hyphen character. :)
declare
  %test:assertTrue
function t:matches-using-hyphen-postfix() {
  matches($t:sample-string, $t:post-hyphen-group)
};

(: Match against a character group that *leads* with the hyphen character. :)
declare
  %test:assertTrue
function t:matches-using-hyphen-prefix() {
  matches($t:sample-string, $t:pre-hyphen-group)
};

(: Instead of `fn:matches()`, try using the post-hyphen character group with 
  `fn:replace()`. The latter is implemented via Saxon in eXist v5.3.0
  (see https://github.com/eXist-db/exist/pull/3530). :)
declare
  %test:assertEquals('aww hecc')
function t:replace-using-hyphen-postfix-and-saxon() {
    replace($t:sample-string, $t:post-hyphen-group, 'c')
};

Context:

  • OS: macOS
  • eXist-db version: 5.3.0
  • OpenJDK 1.8.0_292

Additional context

  • How is eXist-db installed? JAR
  • Any custom changes in e.g. conf.xml? No

amclark42 avatar Oct 22 '21 22:10 amclark42

This bug would be sidestepped by feature request #3633, adopting Saxon’s implementation.

amclark42 avatar Oct 22 '21 22:10 amclark42

@amclark42 I can confirm the bug affects eXist 5.3.0.

In the meantime, escaping the trailing hyphen will work around the issue:

matches("aww he--", "[l\-]")

... returns true().

joewiz avatar Nov 29 '21 19:11 joewiz