containers
containers copied to clipboard
Binary search for Data.Sequence
Data.Sequence as a data structure supports binary search, but it looks like no methods are currently exposed for doing so.
(Well, not exactly binary search, but you know what I mean - search primitives that take O(log n) time given a monotone predicate.)
In particular, I'd love to have something like this in the API:
-- Splits the sequence at a point where the predicate becomes true.
-- When the predicate is monotone, the split occurs at the first such point.
splitL, splitR :: (a -> Bool) -> Seq a -> (Seq a, Seq a)
... or perhaps something more like:
-- Returns the item in the sequence at which the predicate becomes true.
-- The predicate is passed the item, and the numbers of elements on either side of it in the sequence.
searchL, searchR :: (Int -> a -> Int -> Bool) -> Seq a -> Maybe (Seq a, a, Seq a)
An obvious use case is for querying sorted data, stored in a Seq. But there are also more cunning things you can do, whenever the accumulated data is monotone.
(I know I can go and use Data.FingerTree for this sort of thing, but its constants are way worse than Seq, and I nevertheless do keep wanting to be able to search Seqs.)
splitL, splitR :: (a -> Bool) -> Seq a -> (Seq a, Seq a)
https://hackage.haskell.org/package/containers-0.6.5.1/docs/Data-Sequence.html#v:spanl ?
searchL, searchR :: (Int -> a -> Int -> Bool) -> Seq a -> Maybe (Seq a, a, Seq a)
in the API's naming style, this would be span(l/r)WithIndex, which does not exist right now.
Some functions have _WithIndex variants (e.g., map), some do not (e.g., filter?)
Those do linear scans. At the time I lodged this issue, I was interested in binary search for a Seq whose elements are in some kind of sorted order.
I see. So, something like https://hackage.haskell.org/package/containers-0.6.5.1/docs/Data-Set-Internal.html#v:spanAntitone
Seq whose elements are in some kind of sorted order
then why not use Data.Set (can you give an example where that does not work)
Data.Fingertree .. constants way worse than Seq
Wait, what? Data.Sequence implementation is a finger tree.
[EDIT] The only difference (from Data.Fingertree to Data.Sequence) is that the polymorphic measure component is specialized to Int (size)? An ideal compiler should be able to generate efficient code, and not require manual specialisation? Well, then maybe this can serve as a GHC test case.
Fwiw I'm no longer interested in pushing this issue! You can close it if it seems an unwelcome addition to the API, or unwelcome work.
Aside re Fingertree vs Seq constants: when I benchmarked back in the day there was a large difference. I bet the specialization and unpacking of the measure Int counts for a lot in code size and indirection, but I'm not sure this explains what I saw, or if the benchmarks would still see a large discrepancy.