text icon indicating copy to clipboard operation
text copied to clipboard

Add a function to split text at the first occurrence of a character

Open vsannier opened this issue 11 months ago • 4 comments

I was surprised not to find a function in the library to split a text on the first match of a character/text/predicate, with the matched part discarded. This is not difficult to write using breakOn and drop, but I think a function implemented with the internals of the library might have better performance.

split1 :: Char -> Text -> (Text, Text)
split1 c = (id *** drop 1) . (breakOn $ singleton c)

See a similar function (previously called splitOnce) in the byteslice package, and a question on Stack Overflow about splitAtfirst.

vsannier avatar Jan 12 '25 10:01 vsannier

IMHO split1 c = fmap (drop 1) . break (== c) is short enough and does not suffer from any significant performance penalty: drop 1 is constant-time, negligible in comparison to the linear time of break.

Note that the function from byteslice has a different signature, with Maybe.

Bodigrim avatar Jan 13 '25 01:01 Bodigrim

@Bodigrim comparing

split1 c = fmap (drop 1) . (breakOn $ singleton c)

vs

split1 c = fmap (drop 1) . break (== c)

The first option, using breakOn, should be significantly faster than the second option, no?

Rational:

breakOn can look at the needle and choose the optimal strategy (specifically if c is in the ASCII range, it can just do a sequential scan).

break on the other hand always needs to decode to Char.

sol avatar Apr 05 '25 14:04 sol

Corresponding code: https://github.com/haskell/text/blob/5e57460711a9a5ab7f8a30f0e11cd850018dae70/src/Data/Text/Internal/Search.hs#L49-L56

https://github.com/haskell/text/blob/5e57460711a9a5ab7f8a30f0e11cd850018dae70/src/Data/Text/Internal/Search.hs#L93-L100

sol avatar Apr 05 '25 14:04 sol

@sol yes, I'd expect breakOn to be faster.

Bodigrim avatar Apr 05 '25 15:04 Bodigrim