CoreNLP icon indicating copy to clipboard operation
CoreNLP copied to clipboard

Tokensregex error with operator "+" (plus)

Open jpi-seb opened this issue 3 years ago • 1 comments

Hi, I just encountered this error while trying the tokensregex syntax at http://corenlp.run/

  • Version 4.4.0
  • Example of a working pattern: the very* first? day of the tentacle image
  • Example of a failing pattern: the very* first? day+ of the tentacle image

It seems that the + character is escaped as \+ at some point of the process (see the error screenshot). If I try the pattern the very* first? day{1,} of the tentacle, it works as expected.

I also tried to parse the same pattern with the CoreNLP Java library in version 4.4.0, and it works without error with the "+" operator.

String strPattern = "the very* first? day+ of the tentacle";
		
TokenSequenceParser parser = new TokenSequenceParser();
Env env = new Env(parser);
env.initDefaultBindings();
Pair<PatternExpr, SequenceMatchAction<CoreMap>> p = parser.parseSequenceWithAction(env, strPattern);

// => works without error !

I don't know if the problem is just present on the http://corenlp.run/ online tester, or in a Java lib that I haven't tried.

jpi-seb avatar Mar 03 '22 10:03 jpi-seb

I've tried to avoid learning anything about Javascript when I can help it, but in the server .js file, this looks incorrect to me:

  url: serverAddress + '/tokensregex?pattern=' + encodeURIComponent(
    pattern.replace("&", "\\&").replace('+', '\\+')) +

I would think that the whole point of encoding the pattern with encodeURIComponent is to escape all special characters, so a second escaping of + and & shouldn't be necessary. At any rate, the server doesn't double unescape anything that I can see, so the patterns would be interpreted with \ in them and not function correctly.

https://github.com/stanfordnlp/CoreNLP/commit/8413fa1fc432aa2a13cbb4a296352bb9bad4d0cb

On Thu, Mar 3, 2022 at 2:55 AM PERANI Julien @.***> wrote:

Hi, I just encountered this error while trying the tokensregex syntax at http://corenlp.run/

It seems that the + character is escaped as + at some point of the process (see the error screenshot). If I try the pattern the very* first? day{1,} of the tentacle, it works as expected.

I also tried to parse the same pattern with the CoreNLP Java library in version 4.4.0, and it works without error with the "+" operator.

String strPattern = "the very* first? day+ of the tentacle"; TokenSequenceParser parser = new TokenSequenceParser();Env env = new Env(parser); env.initDefaultBindings();Pair<PatternExpr, SequenceMatchAction<CoreMap>> p = parser.parseSequenceWithAction(env, strPattern); // => works without error !

I don't know if the problem is just present on the http://corenlp.run/ online tester, or in a Java lib that I haven't tried.

— Reply to this email directly, view it on GitHub https://github.com/stanfordnlp/CoreNLP/issues/1256, or unsubscribe https://github.com/notifications/unsubscribe-auth/AA2AYWISIXGLA4MDXI4WLI3U6CLCLANCNFSM5P2DTGMA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

You are receiving this because you are subscribed to this thread.Message ID: @.***>

AngledLuffa avatar Mar 03 '22 19:03 AngledLuffa