containers icon indicating copy to clipboard operation
containers copied to clipboard

Binary search for Data.Sequence

Open jamespayor opened this issue 3 years ago • 4 comments

Data.Sequence as a data structure supports binary search, but it looks like no methods are currently exposed for doing so.

(Well, not exactly binary search, but you know what I mean - search primitives that take O(log n) time given a monotone predicate.)

In particular, I'd love to have something like this in the API:

-- Splits the sequence at a point where the predicate becomes true.
-- When the predicate is monotone, the split occurs at the first such point.
splitL, splitR :: (a -> Bool) -> Seq a -> (Seq a, Seq a)

... or perhaps something more like:

-- Returns the item in the sequence at which the predicate becomes true.
-- The predicate is passed the item, and the numbers of elements on either side of it in the sequence.
searchL, searchR :: (Int -> a -> Int -> Bool) -> Seq a -> Maybe (Seq a, a, Seq a)

An obvious use case is for querying sorted data, stored in a Seq. But there are also more cunning things you can do, whenever the accumulated data is monotone.

(I know I can go and use Data.FingerTree for this sort of thing, but its constants are way worse than Seq, and I nevertheless do keep wanting to be able to search Seqs.)

jamespayor avatar Mar 06 '21 02:03 jamespayor

splitL, splitR :: (a -> Bool) -> Seq a -> (Seq a, Seq a)

https://hackage.haskell.org/package/containers-0.6.5.1/docs/Data-Sequence.html#v:spanl ?

searchL, searchR :: (Int -> a -> Int -> Bool) -> Seq a -> Maybe (Seq a, a, Seq a)

in the API's naming style, this would be span(l/r)WithIndex, which does not exist right now.

Some functions have _WithIndex variants (e.g., map), some do not (e.g., filter?)

jwaldmann avatar Jul 05 '22 12:07 jwaldmann

Those do linear scans. At the time I lodged this issue, I was interested in binary search for a Seq whose elements are in some kind of sorted order.

jamespayor avatar Jul 05 '22 14:07 jamespayor

I see. So, something like https://hackage.haskell.org/package/containers-0.6.5.1/docs/Data-Set-Internal.html#v:spanAntitone

Seq whose elements are in some kind of sorted order

then why not use Data.Set (can you give an example where that does not work)

Data.Fingertree .. constants way worse than Seq

Wait, what? Data.Sequence implementation is a finger tree.

[EDIT] The only difference (from Data.Fingertree to Data.Sequence) is that the polymorphic measure component is specialized to Int (size)? An ideal compiler should be able to generate efficient code, and not require manual specialisation? Well, then maybe this can serve as a GHC test case.

jwaldmann avatar Jul 05 '22 19:07 jwaldmann

Fwiw I'm no longer interested in pushing this issue! You can close it if it seems an unwelcome addition to the API, or unwelcome work.

Aside re Fingertree vs Seq constants: when I benchmarked back in the day there was a large difference. I bet the specialization and unpacking of the measure Int counts for a lot in code size and indirection, but I'm not sure this explains what I saw, or if the benchmarks would still see a large discrepancy.

jamespayor avatar Jul 09 '22 16:07 jamespayor