text icon indicating copy to clipboard operation
text copied to clipboard

Text: add fun isSubsequenceOf

Open Anton-Latukha opened this issue 2 years ago • 4 comments

Closes report: #368

Function to detect ordered subsequences in the content.

Anton-Latukha avatar Sep 25 '21 08:09 Anton-Latukha

I currently do not have enough knowledge of low-level and high performance Haskell. But would learn and try to improve.

Currently, next I would cover the supplementary quality required: tests, benchmarks and things that good library piece requires. Having that would look into how further optimization.

Thank you for your work and reviews.

Anton-Latukha avatar Sep 27 '21 09:09 Anton-Latukha

I didn't test this (and so I wouldn't trust it to actually be correct!), but just to give you a rough idea of how a more efficient implementation might look, something like this might be a good starting point:

isSubsequenceOf :: Text -> Text -> Bool
isSubsequenceOf (Text needle o1 l1) (Text target o2 l2) = go o1 o2
  where
    go :: Int -> Int -> Bool
    go i _ | i >= l1 = True
    go _ j | j >= l2 = False
    go !i !j =
      let
        !(Iter c d) =  iterArray needle i
        !(Iter c' d') = iterArray target j
      in
        if c == c'
          then go (i + d) (j + d')
          else go i (j + d')

There is plenty of good material to steal from in the rest of the library too if you want to see how to work with the underlying array. Getting tests and benchmarks going before figuring this out would be great though!

Boarders avatar Sep 28 '21 21:09 Boarders

uh, sorry, I didn't see others' reviews when I was writing mine, and it looks like most of my comments have been mentioned already. Sorry for the duplicate review, but I hope I have something to add nonetheless.

ghost avatar Jun 22 '22 01:06 ghost

I would accept the implementation mentioned in an earlier comment:

isSubsequenceOf needles haystack = List.isSubsequenceOf (T.unpack needles) (T.unpack haystack)

If you want to stick to the current implementation,

  • the order of argument should be the same as Data.List.isSubsequenceOf
  • the auxiliary subseqOf should avoid doing redundant uncons when the needle doesn't change. This can be done by specializing it to a non-empty needle (represented as a Char and Text argument

Compared to using List.isSubsequenceOf, a well optimized implementation would have the advantage of not allocating intermediate data structures (I don't think you need any unsafe primitives, GHC's worker-wrapper + inlining should be sufficient).

Lysxia avatar Feb 07 '23 15:02 Lysxia