ocaml-re icon indicating copy to clipboard operation
ocaml-re copied to clipboard

Re.Posix does not implement character class expressions?

Open lindig opened this issue 1 year ago • 4 comments

lindig avatar Aug 26 '22 10:08 lindig

utop # let re = Re.Posix.compile_pat {|a[:space:]b|};;
top # Re.exec re "a b" |> Re.Group.all;;
Exception: Not_found.
utop # Re.exec re "a:b" |> Re.Group.all;;
- : string array = [|"a:b"|]

It appears that RE.Posix does not implement character class expressions like [:space:] which are part of the language the interface refers to as documentation:

  • https://pubs.opengroup.org/onlinepubs/007908799/xbd/re.html

Unless this is a misunderstanding, I would prefer a more explicit documentation what language is implemented or a a clear warning upfront about the exceptions. Since there is no warning when character class expressions are used, this may come as a surprise.

lindig avatar Aug 26 '22 10:08 lindig

I think you need:

[[:space:]]

instead of just:

[:space:]

Character class expressions are meant to be used inside bracket expressions.

bcc32 avatar Aug 26 '22 16:08 bcc32

Thank you! That makes a lot of sense - so it is a misunderstanding on my part - will test it:

utop # let re = Re.Posix.compile_pat {|a[[:space:]]b|};;
Exception: Re__Posix.Not_supported.

This is re 1.10.4. An explicit error is an improvement but is still looks like this is not implemented.

lindig avatar Aug 26 '22 16:08 lindig

Ah, indeed, that does not appear to be implemented. It is, however, available in Re.Perl and Re.Pcre. The code could probably be shared between them, since I think they use the same character class names.

I see the following code in posix.ml, ha:

      else if accept ':' then begin
        raise Not_supported (*XXX*)

I suppose that's basically a "TODO"?

bcc32 avatar Aug 26 '22 18:08 bcc32