text-icu icon indicating copy to clipboard operation
text-icu copied to clipboard

Unexpected exception and results with unmatched prefix (or suffix)

Open dylex opened this issue 8 years ago • 0 comments

The pure versions of regex match extraction functions, Text.ICU.prefix, suffix, and (possibly) group do not correctly handle the case where a group is in a regex but is not used in a match. For example "a(b)?c" against "ac" or "(a)|b" against "b". They assume that start_ and end_ return -1 only when the grouping is out of range, but in fact they can when a grouping does not fire.

> prefix 1 =<< find (regex [] "abc(def)?ghi") "xabcghiy"
*** Exception: Data.Text.Array.new: size overflow
CallStack (from HasCallStack):
  error, called at ./Data/Text/Array.hs:129:20 in text-1.2.2.1-FeA6fTH3E2n883cNXIS2Li:Data.Text.Array
> suffix 1 =<< find (regex [] "abc(def)?ghi") "xabcghiy"
Just "\NULxabcghiy"

An out of bounds range gives the expected results:

> prefix 2 =<< find (regex [] "abc(def)?ghi") "xabcghiy"
Nothing
> suffix 2 =<< find (regex [] "abc(def)?ghi") "xabcghiy"
Nothing

group possibly does right thing, but not for the right reason (it extracts -1 to -1), and perhaps should return Nothing instead:

> group 1 =<< find (regex [] "abc(def)?ghi") "xabcghiy"
Just ""

One solution would be to use the safe underlying start and end functions instead, returning Nothing for any underlying Nothing. Happy to submit a PR for this approach.

dylex avatar May 22 '17 19:05 dylex