ocaml-re icon indicating copy to clipboard operation
ocaml-re copied to clipboard

Re.split odd behaviour with separator at beginning/end

Open darioteixeira opened this issue 9 years ago • 3 comments

Consider the following toplevel session:

# let rex = Re.(compile (alt [char '\n'; str "\r\n"]));;
val rex : Re.re = <abstr>
# Re.split rex "hello\n\nworld";;
- : string list = ["hello"; ""; "world"]
# Re.split rex "\nhello\n\nworld\n";;
- : string list = ["hello"; ""; "world"]
# Re.split rex "\n\nhello\n\nworld\n\n";;
- : string list = [""; "hello"; ""; "world"; ""]

I understand that Re.split's proper behaviour in this case -- when a separator occurs at the very beginning or at the very end of a string -- is open to discussion. Nevertheless, the currently implemented behaviour as shown above strikes me as odd: if the number of separators at the beginning is 0, 1, and 2, the number of empty elements will be 0, 0, and 1, respectively.

I think it makes more sense for a single separator at the start to produce a list whose first element is empty. Likewise, a single separator at the end should produce an empty last element. In other words, if the number of separators at the beginning is 0, 1, and 2, the number of empty elements should also be 0, 1, and 2, respectively.

I've encountered this issue in practice while porting from OCaml-pcre to OCaml-re, and coding around it is a major PITA.

darioteixeira avatar Nov 30 '15 18:11 darioteixeira

Incidentally, the behaviour of the PCRE library is more useful in practice, though also a bit quirky: for separators at the beginning of a string it behaves like my proposal above; however, any number of separators at the end are simply discarded.

darioteixeira avatar Nov 30 '15 18:11 darioteixeira

Agreed (having just also been stung by this). It's surprising that let split_on_char c = Re.(split (compile (char c)) is not equivalent to OCaml's String.split_on_char (not only the above examples, but also passing "" to Re.split returns an empty list).

dra27 avatar Nov 14 '22 10:11 dra27

#233 adds a Re.split_delim function which behaves as you expect.

vouillon avatar Dec 04 '23 14:12 vouillon