grammarinator icon indicating copy to clipboard operation
grammarinator copied to clipboard

ANTLR's NOT for chars

Open 38b394ce01 opened this issue 2 years ago • 1 comments

Changed the following two things using the NOT feature in ANTLR:

  • Unescape ' because in ANTLR it has to be escaped (e.g. Comment: ~'\'';) This is done with the .replace("\'", "'") Without these patch, the code will use the second char which is then the backslash instead of the quote character.
  • Added Unicode support. This is done by the decode("unicode-escape", "strict"). Without these patch, the use of any Unicode will also lead to backslash.

Note: Using NOT for a string is still bugged. Only the first char of a string is used and not the whole string. But this would be a bigger change in higher parts.

38b394ce01 avatar Mar 24 '22 17:03 38b394ce01

Decoding with unicode-escape resolves any escaping problems, no need for e.g. .replace("\'", "'") This works with all single char rules like: Comment: ~'\''; or Newline: ~'\n'; or Backslash: ~'\\'; or Unicode1: ~'\u0061'; or with strings longer then 1 char like Unicode2: ~'\u0061bc'; or Unicode2: ~'1\u0061bc';.

Limitations:

  1. Unfortunately the used ASCII range is printables only, so chars like \n or \r are never used. The first char is 0x20 in ASCII the space.
  2. Does not work with chars out of ASCII range, only chars within ASCII encoded as Unicode will work.

All three limitations can not be resolved by the code lines I changed here. So with this change only ' and ASCII chars encoded in Unicode will work which was not the case before.

38b394ce01 avatar Mar 25 '22 09:03 38b394ce01

Thanks for the PR, but the issue was resolved as part of a bigger improvement around escapes (#75).

If you find that the landed PR did not solve all the limitations, please, do open a new issue or PR.

renatahodovan avatar Mar 08 '23 09:03 renatahodovan