ocaml-re icon indicating copy to clipboard operation
ocaml-re copied to clipboard

Character classe ranges can be passed in reverse order, meaning that `Re.Pcre` is not perl-compatible

Open nolenroyalty opened this issue 4 years ago • 1 comments

Cset.seq compares its two arguments and, if the first argument is greater than the second argument, reverses their order: https://github.com/ocaml/ocaml-re/blob/master/lib/cset.ml#L74

This means that Re.Pcre.regexp will accept arguments like [9-0] - which will be automatically converted to [0-9]. This is inconsistent with perl's behavior, e.g:

$ echo 1 | perl -pe 's/[0-9]/foo/'
foo
$ echo 1 | perl -pe 's/[9-0]/foo/'
Invalid [] range "9-0" in regex; marked by <-- HERE in m/[9-0 <-- HERE ]/ at -e line 1.

(I'm not very familiar with the code-base, but it seems like another potential place that this could be addressed is when perl.ml parses character-classes, by adding some validation before creating a range)

This is a pretty minor bug, but can result in confusing behavior for typos like [0- 9] - which Re.Pcre.regexp will parse happily.

nolenroyalty avatar Apr 27 '21 16:04 nolenroyalty

Thanks, that looks like a bug indeed. Would you mind sending a PR to do the validation? What about Re.Posix, does it suffer from the same problem?

rgrinberg avatar Jun 29 '21 03:06 rgrinberg