text-icu
text-icu copied to clipboard
Unexpected exception and results with unmatched prefix (or suffix)
The pure versions of regex match extraction functions, Text.ICU.prefix, suffix, and (possibly) group do not correctly handle the case where a group is in a regex but is not used in a match. For example "a(b)?c" against "ac" or "(a)|b" against "b". They assume that start_ and end_ return -1 only when the grouping is out of range, but in fact they can when a grouping does not fire.
> prefix 1 =<< find (regex [] "abc(def)?ghi") "xabcghiy"
*** Exception: Data.Text.Array.new: size overflow
CallStack (from HasCallStack):
error, called at ./Data/Text/Array.hs:129:20 in text-1.2.2.1-FeA6fTH3E2n883cNXIS2Li:Data.Text.Array
> suffix 1 =<< find (regex [] "abc(def)?ghi") "xabcghiy"
Just "\NULxabcghiy"
An out of bounds range gives the expected results:
> prefix 2 =<< find (regex [] "abc(def)?ghi") "xabcghiy"
Nothing
> suffix 2 =<< find (regex [] "abc(def)?ghi") "xabcghiy"
Nothing
group possibly does right thing, but not for the right reason (it extracts -1 to -1), and perhaps should return Nothing instead:
> group 1 =<< find (regex [] "abc(def)?ghi") "xabcghiy"
Just ""
One solution would be to use the safe underlying start and end functions instead, returning Nothing for any underlying Nothing. Happy to submit a PR for this approach.