containers icon indicating copy to clipboard operation
containers copied to clipboard

Add 'between' to 'Data.Map' and 'Data.Set'

Open TOTBWF opened this issue 2 years ago • 10 comments

Patch Description

This PR adds 2 new functions to containers: Data.Map.between, and Data.Set.between. These functions will consume a map/set (respectively), and will filter the map/set so that the keys/values lie between an upper and lower bound (inclusive). This operation comes up pretty often in certain applications, and feels generally useful.

Notes

I'm by no means convinced that between is a good name, so if anyone feels that they have something more evocative please chime in!

TOTBWF avatar Oct 02 '21 03:10 TOTBWF

Interesting. But this is only one of four kinds of "between". What about the others?

treeowl avatar Oct 02 '21 03:10 treeowl

Perhaps a more generalized version could be added, along with something like:

data Bound a
  = Inclusive a
  | Exclusive a

If this was added, I still think the simpler inclusive version ought to be added for ergonomics sake, but I might just be biased because that's the one I need 🙂

TOTBWF avatar Oct 02 '21 04:10 TOTBWF

Yeah, I think you're biased. I imagine just as many people will need the others.

treeowl avatar Oct 02 '21 04:10 treeowl

For reference: GHC contains a related restrictMap utility:

https://gitlab.haskell.org/ghc/ghc/-/blob/18b9ba5602121c75f184f29e5b3e70bd7d4779c4/compiler/GHC/Cmm/Switch.hs#L467-470

Note that it's implemented using split, which I think might be more efficient.

sjakobi avatar Mar 09 '22 14:03 sjakobi

why would implementation via split be more efficient? It's something like

between lo hi = snd . M.split lo . fst . M.split hi

and the first split it might waste some work to build (re-balance) a tree that will be cut down by the second split. Consider the extreme case lo = hi = some mid-range key. The suggested implementation will do no balancing, but just return the singleton (or empty) map.

I'm all for keeping interfaces small, but this could be a case where the current interface does not allow an efficient implementation.

jwaldmann avatar Mar 10 '22 12:03 jwaldmann

Thanks for pointing this out, @jwaldmann! I think I misunderstood the current implementation and also missed the bit where split is strict in both the trees it returns.

In light of this, I think the current implementation is actually pretty good.

sjakobi avatar Mar 10 '22 14:03 sjakobi

Strictness - I did not check, but yes, I thought that all balanced maps are strict in the structure. Note, my argument (too much work) also holds if only one of the components returned by split is evaluated.

Perhaps the suggested function (and implementation) can be generalized to take a "bitonic filter" predicate? (a function f :: Key -> Ordering that must be monotonic - which we don't check - , and the result is the submap of all keys k with f k == EQ) But that's speculative generality ... It would solve the inclusive/exclusive variants, we could write filterBitonic (\ k -> if k <= lo then LT else if k > hi then GT else EQ), etc. Well, looks ugly, and looks like more operations - unless it's inlined everywhere. Altough I guess the cost is not in doing these comparisons, but in the construction (the balancing) of the result map.

[EDIT] the between function could detect and short-cut the case lo > hi (giving empty result). My suggested filterBitonic function could not do this (it cannot know beforehand that there is no k with f k == EQ).

jwaldmann avatar Mar 10 '22 14:03 jwaldmann

I made a note of this at https://gitlab.haskell.org/ghc/ghc/-/issues/21217

jwaldmann avatar Mar 10 '22 17:03 jwaldmann

See also https://github.com/haskell/containers/issues/778 . It says that implementation via split is "at most a constant factor off", which I believe for time, but not for space.

jwaldmann avatar Mar 12 '22 13:03 jwaldmann

I wonder if we can generalize this idea some more, taking some inspiration from mergeA. Suppose we have two predicates, p, q :: k -> Bool, with the following properties:

  1. If x <= y then p x <= p y and q x <= q y.
  2. For any x, q x <= p x.

Now p and q will divide any set/map into three portions:

  1. An initial part where p x = q x = False.
  2. A middle part where p x = True but q x = False.
  3. A final part where p x = q x = True.

We might want to do something different with each part: preserve it, drop it, or transform it.

treeowl avatar Jul 21 '22 17:07 treeowl