ocaml icon indicating copy to clipboard operation
ocaml copied to clipboard

Make Map length run in O(1) instead of O(n)

Open chshersh opened this issue 1 year ago • 13 comments

Hi OCaml developers team 👋🏻

I want to propose a change to the OCaml standard library.

Please, let me know if it should be done differently!

What?

Currently, the implementation of Map.cardinal (a function to return the size of the Map) takes linear time:

https://github.com/ocaml/ocaml/blob/b72d99b3dbf3a593f9ff38acc4a375a2e2fd8749/stdlib/map.ml#L493-L495

I propose to

  • [ ] Change the implementation to run in constant time, independent of the tree size
  • [ ] Document that the implementation runs in constant time

Why?

It's a common expectation for a dictionary-like data structure to return its own size in a constant time.

How?

The current linear-time behaviour is a consequence of the fact that a tree node (internal Map representation) stores height instead of size.

https://github.com/ocaml/ocaml/blob/b72d99b3dbf3a593f9ff38acc4a375a2e2fd8749/stdlib/map.ml#L77-L79

I propose to change the internals and store size (and change all the relevant functions). I expect the existing test suite to be exhaustive enough to make sure that there're no regressions after this change. But, obviously, new tests are always welcome!

There's a prior art on how to rebalance a binary search tree when a node stores the size. One can refer to the Haskell implementation:

  • https://github.com/haskell/containers/blob/f5d0b13251291c3bd1ae396f3e6c8b0b9eaf58b0/containers/src/Data/Map/Internal.hs#L4162-L4208

Impact

This should be a backward-compatible change, impacting only internals.


Let me know what you think!

chshersh avatar Dec 27 '23 14:12 chshersh

The comments in GHC's implementation say:

-- The implementation of 'Map' is based on /size balanced/ binary trees (or
-- trees of /bounded balance/) as described by:
--
--    * Stephen Adams, \"/Efficient sets: a balancing act/\",
--     Journal of Functional Programming 3(4):553-562, October 1993,
--     <http://www.swiss.ai.mit.edu/~adams/BB/>.
--    * J. Nievergelt and E.M. Reingold,
--      \"/Binary search trees of bounded balance/\",
--      SIAM journal of computing 2(1), March 1973.
  [balance l x r] balances two trees with value x.
  The sizes of the trees should balance after decreasing the
  size of one of them. (a rotation).

  [delta] is the maximal relative difference between the sizes of
          two trees, it corresponds with the [w] in Adams' paper.
  [ratio] is the ratio between an outer and inner sibling of the
          heavier subtree in an unbalanced setting. It determines
          whether a double or single rotation should be performed
          to restore balance. It is corresponds with the inverse
          of $\alpha$ in Adam's article.

  Note that according to the Adam's paper:
  - [delta] should be larger than 4.646 with a [ratio] of 2.
  - [delta] should be larger than 3.745 with a [ratio] of 1.534.

  But the Adam's paper is erroneous:
  - It can be proved that for delta=2 and delta>=5 there does
    not exist any ratio that would work.
  - Delta=4.5 and ratio=2 does not work.

  That leaves two reasonable variants, delta=3 and delta=4,
  both with ratio=2.

  - A lower [delta] leads to a more 'perfectly' balanced tree.
  - A higher [delta] performs less rebalancing.

  In the benchmarks, delta=3 is faster on insert operations,
  and delta=4 has slightly better deletes. As the insert speedup
  is larger, we currently use delta=3.

I have two natural questions:

  1. Is the algorithm believed to be correct, is it clear or at least well-understood that it has the expected logarithmic worst-case bounds?
  2. How do the performance compare in practice to OCaml's current implementation?

Answering (2) requires implementing the structure in OCaml. I don't see how we would be able to make a decision on stdlib inclusion without an answer to (2). So I think that you should first implement it in OCaml and run benchmarks. Note that at this point you could release it as a third-party package on opam, as a first step (whether or not stdlib inclusion is decided) that already benefits the community.

gasche avatar Dec 27 '23 15:12 gasche

It looks like adding count field to the Node type, and as well updating the bal function to correctly set the count field when balancing. Then, updating add and remove functions to maintain the count field of any affected nodes. But my concern is that achieving O(1) with cardinal operation may make add and remove somewhat O(log n).

AfolabiOlaoluwa avatar Dec 27 '23 16:12 AfolabiOlaoluwa

This is called weight-balanced trees, here's an implementation: https://github.com/let-def/grenier/blob/master/baltree/bt2.ml and the map structure based on it: https://github.com/let-def/grenier/blob/master/balmap/map.ml

The current AVL-based implementation was once reimplemented in Coq and verified: Functors for Proofs and Programs. In fact it helped find a bug in the OCaml implementation: #8176, #8177, #8178, 38558879, d6070826.

wikku avatar Dec 27 '23 17:12 wikku

The smallest possible change that would accomplish @chshersh's goal would be as follows.

    type 'a t =
        Empty
-     | Node of {l:'a t; v:key; d:'a; r:'a t; h:int}
+     | Node of {l:'a t; v:key; d:'a; r:'a t; h:int; sz:int}
    let cardinal = function
        Empty -> 0
-     | Node {l; r} -> cardinal l + 1 + cardinal r
+     | Node {sz} -> sz
    let create l x d r =
      let hl = height l and hr = height r in
+     let szl = cardinal l and szr = cardinal r in
-     Node{l; v=x; d; r; h=(if hl >= hr then hl + 1 else hr + 1)}
+     Node{l; v=x; d; r; h=(if hl >= hr then hl + 1 else hr + 1); sz=(szl + szr + 1)}

Other small administrative edits will be needed, for example: singleton should initialize with sz = 1, and operations such as map should carry the sz through unchanged, etc.

This requires no changes to the balancing scheme. The idea is to take advantage of the fact that the height already needs to be automatically computed. Whenever the library computes the height of a node, we can also automatically compute its size in O(1) time. As a result, the size of the whole tree is always known at the root and can be retrieved in O(1).

The only additional runtime cost is the time and space required to store and compute the sizes. In my experience, this cost is typically lost in the wash in comparison to all the other costs of managing tree nodes. But, some benchmarking would of course be needed.

By the way, for those curious: this is a form of "augmented tree" or "augmented map" (using the terminology of Sun, Ferizovic, and Blelloch). The observation is that any tree structure can be augmented with an associative operation with intermediate values cached at the nodes, by applying the operation a constant number of times at each node construction. The associative operation in this case is just addition (on sizes).

shwestrick avatar Dec 27 '23 17:12 shwestrick

The smallest possible change that would accomplish @chshersh's goal would be as follows.

This approach increases the memory footprint of the data structure significantly (and it also adds some runtime overhead to other operations, even if the complexity does not change). Whether it is a good tradeoff or not depends on the use case.

As for the original proposal, I can only second @gasche's comments. In addition, let's be careful about changing the actual memory representation; some users might expect the Map datastructure to be stable enough across versions and happily marshal them. (I don't think there is an official commitment to maintaining marshal-level compatibility across versions, but the stdlib already contains hacks to allow reading Hashtbl values stored from by old versions of OCaml, ...)

alainfrisch avatar Dec 27 '23 20:12 alainfrisch

My very first implementation of sets and maps, in Caml Light 0.6 I think, was using weight-balanced binary search trees, following Stephen Adam's code. Then I showed it to Bruno Salvy (someone who knows how to analyze an algorithm) and he pointed out that weight-balanced trees cause significantly more rebalancing than height-balanced trees as the size of the trees increase. So I switched to the current relaxed-AVL height-balancing criterion in Caml Light 0.7, and have no intention of going back.

Like @alainfrisch, I'd rather not add a "size" field to every node of a map or a set. That's quite a space overhead for such a minor feature.

xavierleroy avatar Dec 29 '23 09:12 xavierleroy

This is called weight-balanced trees, here's an implementation: https://github.com/let-def/grenier/blob/master/baltree/bt2.ml and the map structure based on it: https://github.com/let-def/grenier/blob/master/balmap/map.ml

The current AVL-based implementation was once reimplemented in Coq and verified: Functors for Proofs and Programs. In fact it helped find a bug in the OCaml implementation: #8176, #8177, #8178, 3855887, d607082.

Thanks for quoting my code, I am surprised someone know/remember that :). FWIW, the balancing criterion I implemented is: https://link.springer.com/chapter/10.1007/3-540-48224-5_39 . So, technically, it is not a weight-balanced tree, even tough it uses the size.

And I also did an (extremely ugly) implementation in Coq, mostly to learn Coq a bit. I don't remember what I proved though (not the complexity as I was implementing operations in plain Gallina, though I proved the invariants were maintained; that plus the simple recursion might be enough to be convinced that the complexity is right ...).

let-def avatar Jan 05 '24 09:01 let-def

Like @alainfrisch, I'd rather not add a "size" field to every node of a map or a set. That's quite a space overhead for such a minor feature.

What if we add size only to the root level? A Map then will become a record of two fields:

  1. Tree itself
  2. Its size

This will require to change the implementation of functions like add, update and remove to track the size and make sure if it was changed.

chshersh avatar Jan 05 '24 14:01 chshersh

What if we add size only to the root level? A Map then will become a record of two fields:

  1. Tree itself
  2. Its size

This will require to change the implementation of functions like add, update and remove to track the size and make sure if it was changed.

It's a little trickier than that, because bulk operations can result in large changes in the size. At first it might seem like you could address this by recomputing the size (by walking the whole tree) after completing any bulk operation, but this would result in a significant slowdown, asymptotically.

Functions like union might be a good example; union-ing two trees is almost always asymptotically faster than linear.

So, the code would need to be updated to track size diffs across bulk operations. It's an interesting challenge, and seems doable. But, certainly tricky! And would probably require large changes to the code.

shwestrick avatar Jan 05 '24 15:01 shwestrick

An important point that is missing from the discussion so far is that why should we want to make the cardinal function O(1) by decreasing the performance of more important functions?

In particular, I am struggling to imagine a situation where the performance of cardinal matters compared to the time spent creating the map.

Moreover, if it easy to memoize the computation of cardinal, if required by defining

module Memo(M: sig type 'a t val cardinal: 'a t -> int end) = struct
  type 'a t = { map: 'a M.t; cardinal: int Lazy.t }
  let make x = { map = x; cardinal = lazy (M.cardinal x) }
end

I am afraid that the without such use case the discussion sounds a bit like a premature optimisation to me.

Octachron avatar Jan 05 '24 15:01 Octachron

An important point that is missing from the discussion so far is that why should we want to make the cardinal function O(1) by decreasing the performance of more important functions?

My guess: because that's what the C++ standard library does (constant-time size operations for all data containers), so people have grown accustomed to it, and carelessly write size(x) > 0 instead of not (empty x). I personally think it's a bad idea: emptiness tests must be constant time, but size operations can be linear time. But it's hard to disagree with the C++ received perception.

Some pointers to discussions in the context of C++ std:

  • in favor of constant-time size: https://howardhinnant.github.io/On_list_size.html
  • questioning constant-time size: https://www.reddit.com/r/cpp/comments/1140qip/making_stdlistsize_o1_is_a_mistake

xavierleroy avatar Jan 06 '24 16:01 xavierleroy

Another guess–it's O(1) in Haskell, and people compare OCaml with Haskell pretty frequently.

Side note–in the OCaml standard library the time complexity of standard data structure algorithms is not documented (this is actually indirectly a point made by OP), so people can find it surprising because they had wrongly assumed that cardinal is O(1). In that case another takeaway from this is that it would be a good idea to document the big-O for all relevant functions, eg Haskell https://hackage.haskell.org/package/containers-0.4.0.0/docs/Data-Map.html#v:size

yawaramin avatar Jan 06 '24 22:01 yawaramin

baby (documentation) implements weight-balanced trees, following the paper Joinable Parallel Balanced Binary Trees by Blelloch, Ferizovic, and Sun (2022). The complexity of every operation is documented.

I have tested it quite heavily, and Wassel Bousmaha has verified its balancing code in Coq, so I have strong confidence that it is correct.

I have benchmarked it quite heavily. I find that it is usually slightly faster (and sometimes significantly faster) than OCaml's Set library.

For the moment, it implements just the Set API, not the Map API.

I am still waiting for feedback from users.

fpottier avatar Aug 08 '24 20:08 fpottier