ocaml-re
ocaml-re copied to clipboard
Character classe ranges can be passed in reverse order, meaning that `Re.Pcre` is not perl-compatible
Cset.seq compares its two arguments and, if the first argument is greater than the second argument, reverses their order: https://github.com/ocaml/ocaml-re/blob/master/lib/cset.ml#L74
This means that Re.Pcre.regexp will accept arguments like [9-0] - which will be automatically converted to [0-9]. This is inconsistent with perl's behavior, e.g:
$ echo 1 | perl -pe 's/[0-9]/foo/'
foo
$ echo 1 | perl -pe 's/[9-0]/foo/'
Invalid [] range "9-0" in regex; marked by <-- HERE in m/[9-0 <-- HERE ]/ at -e line 1.
(I'm not very familiar with the code-base, but it seems like another potential place that this could be addressed is when perl.ml parses character-classes, by adding some validation before creating a range)
This is a pretty minor bug, but can result in confusing behavior for typos like [0- 9] - which Re.Pcre.regexp will parse happily.
Thanks, that looks like a bug indeed. Would you mind sending a PR to do the validation? What about Re.Posix, does it suffer from the same problem?