text
text copied to clipboard
Text: add fun isSubsequenceOf
Closes report: #368
Function to detect ordered subsequences in the content.
I currently do not have enough knowledge of low-level and high performance Haskell. But would learn and try to improve.
Currently, next I would cover the supplementary quality required: tests, benchmarks and things that good library piece requires. Having that would look into how further optimization.
Thank you for your work and reviews.
I didn't test this (and so I wouldn't trust it to actually be correct!), but just to give you a rough idea of how a more efficient implementation might look, something like this might be a good starting point:
isSubsequenceOf :: Text -> Text -> Bool
isSubsequenceOf (Text needle o1 l1) (Text target o2 l2) = go o1 o2
where
go :: Int -> Int -> Bool
go i _ | i >= l1 = True
go _ j | j >= l2 = False
go !i !j =
let
!(Iter c d) = iterArray needle i
!(Iter c' d') = iterArray target j
in
if c == c'
then go (i + d) (j + d')
else go i (j + d')
There is plenty of good material to steal from in the rest of the library too if you want to see how to work with the underlying array. Getting tests and benchmarks going before figuring this out would be great though!
uh, sorry, I didn't see others' reviews when I was writing mine, and it looks like most of my comments have been mentioned already. Sorry for the duplicate review, but I hope I have something to add nonetheless.
I would accept the implementation mentioned in an earlier comment:
isSubsequenceOf needles haystack = List.isSubsequenceOf (T.unpack needles) (T.unpack haystack)
If you want to stick to the current implementation,
- the order of argument should be the same as
Data.List.isSubsequenceOf
- the auxiliary
subseqOf
should avoid doing redundantuncons
when the needle doesn't change. This can be done by specializing it to a non-empty needle (represented as aChar
andText
argument
Compared to using List.isSubsequenceOf
, a well optimized implementation would have the advantage of not allocating intermediate data structures (I don't think you need any unsafe primitives, GHC's worker-wrapper + inlining should be sufficient).