ocaml
ocaml copied to clipboard
Make Map length run in O(1) instead of O(n)
Hi OCaml developers team 👋🏻
I want to propose a change to the OCaml standard library.
Please, let me know if it should be done differently!
What?
Currently, the implementation of Map.cardinal
(a function to return the size of the Map
) takes linear time:
https://github.com/ocaml/ocaml/blob/b72d99b3dbf3a593f9ff38acc4a375a2e2fd8749/stdlib/map.ml#L493-L495
I propose to
- [ ] Change the implementation to run in constant time, independent of the tree size
- [ ] Document that the implementation runs in constant time
Why?
It's a common expectation for a dictionary-like data structure to return its own size in a constant time.
How?
The current linear-time behaviour is a consequence of the fact that a tree node (internal Map
representation) stores height instead of size.
https://github.com/ocaml/ocaml/blob/b72d99b3dbf3a593f9ff38acc4a375a2e2fd8749/stdlib/map.ml#L77-L79
I propose to change the internals and store size (and change all the relevant functions). I expect the existing test suite to be exhaustive enough to make sure that there're no regressions after this change. But, obviously, new tests are always welcome!
There's a prior art on how to rebalance a binary search tree when a node stores the size. One can refer to the Haskell implementation:
- https://github.com/haskell/containers/blob/f5d0b13251291c3bd1ae396f3e6c8b0b9eaf58b0/containers/src/Data/Map/Internal.hs#L4162-L4208
Impact
This should be a backward-compatible change, impacting only internals.
Let me know what you think!
The comments in GHC's implementation say:
-- The implementation of 'Map' is based on /size balanced/ binary trees (or
-- trees of /bounded balance/) as described by:
--
-- * Stephen Adams, \"/Efficient sets: a balancing act/\",
-- Journal of Functional Programming 3(4):553-562, October 1993,
-- <http://www.swiss.ai.mit.edu/~adams/BB/>.
-- * J. Nievergelt and E.M. Reingold,
-- \"/Binary search trees of bounded balance/\",
-- SIAM journal of computing 2(1), March 1973.
[balance l x r] balances two trees with value x.
The sizes of the trees should balance after decreasing the
size of one of them. (a rotation).
[delta] is the maximal relative difference between the sizes of
two trees, it corresponds with the [w] in Adams' paper.
[ratio] is the ratio between an outer and inner sibling of the
heavier subtree in an unbalanced setting. It determines
whether a double or single rotation should be performed
to restore balance. It is corresponds with the inverse
of $\alpha$ in Adam's article.
Note that according to the Adam's paper:
- [delta] should be larger than 4.646 with a [ratio] of 2.
- [delta] should be larger than 3.745 with a [ratio] of 1.534.
But the Adam's paper is erroneous:
- It can be proved that for delta=2 and delta>=5 there does
not exist any ratio that would work.
- Delta=4.5 and ratio=2 does not work.
That leaves two reasonable variants, delta=3 and delta=4,
both with ratio=2.
- A lower [delta] leads to a more 'perfectly' balanced tree.
- A higher [delta] performs less rebalancing.
In the benchmarks, delta=3 is faster on insert operations,
and delta=4 has slightly better deletes. As the insert speedup
is larger, we currently use delta=3.
I have two natural questions:
- Is the algorithm believed to be correct, is it clear or at least well-understood that it has the expected logarithmic worst-case bounds?
- How do the performance compare in practice to OCaml's current implementation?
Answering (2) requires implementing the structure in OCaml. I don't see how we would be able to make a decision on stdlib inclusion without an answer to (2). So I think that you should first implement it in OCaml and run benchmarks. Note that at this point you could release it as a third-party package on opam, as a first step (whether or not stdlib inclusion is decided) that already benefits the community.
It looks like adding count
field to the Node type, and as well updating the bal
function to correctly set the count
field when balancing. Then, updating add
and remove
functions to maintain the count
field of any affected nodes. But my concern is that achieving O(1) with cardinal
operation may make add
and remove
somewhat O(log n).
This is called weight-balanced trees, here's an implementation: https://github.com/let-def/grenier/blob/master/baltree/bt2.ml and the map structure based on it: https://github.com/let-def/grenier/blob/master/balmap/map.ml
The current AVL-based implementation was once reimplemented in Coq and verified: Functors for Proofs and Programs. In fact it helped find a bug in the OCaml implementation: #8176, #8177, #8178, 38558879, d6070826.
The smallest possible change that would accomplish @chshersh's goal would be as follows.
type 'a t =
Empty
- | Node of {l:'a t; v:key; d:'a; r:'a t; h:int}
+ | Node of {l:'a t; v:key; d:'a; r:'a t; h:int; sz:int}
let cardinal = function
Empty -> 0
- | Node {l; r} -> cardinal l + 1 + cardinal r
+ | Node {sz} -> sz
let create l x d r =
let hl = height l and hr = height r in
+ let szl = cardinal l and szr = cardinal r in
- Node{l; v=x; d; r; h=(if hl >= hr then hl + 1 else hr + 1)}
+ Node{l; v=x; d; r; h=(if hl >= hr then hl + 1 else hr + 1); sz=(szl + szr + 1)}
Other small administrative edits will be needed, for example: singleton
should initialize with sz = 1
, and operations such as map
should carry the sz
through unchanged, etc.
This requires no changes to the balancing scheme. The idea is to take advantage of the fact that the height already needs to be automatically computed. Whenever the library computes the height of a node, we can also automatically compute its size in O(1) time. As a result, the size of the whole tree is always known at the root and can be retrieved in O(1).
The only additional runtime cost is the time and space required to store and compute the sizes. In my experience, this cost is typically lost in the wash in comparison to all the other costs of managing tree nodes. But, some benchmarking would of course be needed.
By the way, for those curious: this is a form of "augmented tree" or "augmented map" (using the terminology of Sun, Ferizovic, and Blelloch). The observation is that any tree structure can be augmented with an associative operation with intermediate values cached at the nodes, by applying the operation a constant number of times at each node construction. The associative operation in this case is just addition (on sizes).
The smallest possible change that would accomplish @chshersh's goal would be as follows.
This approach increases the memory footprint of the data structure significantly (and it also adds some runtime overhead to other operations, even if the complexity does not change). Whether it is a good tradeoff or not depends on the use case.
As for the original proposal, I can only second @gasche's comments. In addition, let's be careful about changing the actual memory representation; some users might expect the Map datastructure to be stable enough across versions and happily marshal them. (I don't think there is an official commitment to maintaining marshal-level compatibility across versions, but the stdlib already contains hacks to allow reading Hashtbl values stored from by old versions of OCaml, ...)
My very first implementation of sets and maps, in Caml Light 0.6 I think, was using weight-balanced binary search trees, following Stephen Adam's code. Then I showed it to Bruno Salvy (someone who knows how to analyze an algorithm) and he pointed out that weight-balanced trees cause significantly more rebalancing than height-balanced trees as the size of the trees increase. So I switched to the current relaxed-AVL height-balancing criterion in Caml Light 0.7, and have no intention of going back.
Like @alainfrisch, I'd rather not add a "size" field to every node of a map or a set. That's quite a space overhead for such a minor feature.
This is called weight-balanced trees, here's an implementation: https://github.com/let-def/grenier/blob/master/baltree/bt2.ml and the map structure based on it: https://github.com/let-def/grenier/blob/master/balmap/map.ml
The current AVL-based implementation was once reimplemented in Coq and verified: Functors for Proofs and Programs. In fact it helped find a bug in the OCaml implementation: #8176, #8177, #8178, 3855887, d607082.
Thanks for quoting my code, I am surprised someone know/remember that :). FWIW, the balancing criterion I implemented is: https://link.springer.com/chapter/10.1007/3-540-48224-5_39 . So, technically, it is not a weight-balanced tree, even tough it uses the size.
And I also did an (extremely ugly) implementation in Coq, mostly to learn Coq a bit. I don't remember what I proved though (not the complexity as I was implementing operations in plain Gallina, though I proved the invariants were maintained; that plus the simple recursion might be enough to be convinced that the complexity is right ...).
Like @alainfrisch, I'd rather not add a "size" field to every node of a map or a set. That's quite a space overhead for such a minor feature.
What if we add size only to the root level? A Map
then will become a record of two fields:
- Tree itself
- Its size
This will require to change the implementation of functions like add
, update
and remove
to track the size and make sure if it was changed.
What if we add size only to the root level? A
Map
then will become a record of two fields:
- Tree itself
- Its size
This will require to change the implementation of functions like
add
,update
andremove
to track the size and make sure if it was changed.
It's a little trickier than that, because bulk operations can result in large changes in the size. At first it might seem like you could address this by recomputing the size (by walking the whole tree) after completing any bulk operation, but this would result in a significant slowdown, asymptotically.
Functions like union
might be a good example; union
-ing two trees is almost always asymptotically faster than linear.
So, the code would need to be updated to track size diffs across bulk operations. It's an interesting challenge, and seems doable. But, certainly tricky! And would probably require large changes to the code.
An important point that is missing from the discussion so far is that why should we want to make the cardinal
function O(1)
by decreasing the performance of more important functions?
In particular, I am struggling to imagine a situation where the performance of cardinal
matters compared to the time spent creating the map
.
Moreover, if it easy to memoize the computation of cardinal
, if required by defining
module Memo(M: sig type 'a t val cardinal: 'a t -> int end) = struct
type 'a t = { map: 'a M.t; cardinal: int Lazy.t }
let make x = { map = x; cardinal = lazy (M.cardinal x) }
end
I am afraid that the without such use case the discussion sounds a bit like a premature optimisation to me.
An important point that is missing from the discussion so far is that why should we want to make the cardinal function O(1) by decreasing the performance of more important functions?
My guess: because that's what the C++ standard library does (constant-time size
operations for all data containers), so people have grown accustomed to it, and carelessly write size(x) > 0
instead of not (empty x)
. I personally think it's a bad idea: emptiness tests must be constant time, but size operations can be linear time. But it's hard to disagree with the C++ received perception.
Some pointers to discussions in the context of C++ std:
- in favor of constant-time size: https://howardhinnant.github.io/On_list_size.html
- questioning constant-time size: https://www.reddit.com/r/cpp/comments/1140qip/making_stdlistsize_o1_is_a_mistake
Another guess–it's O(1) in Haskell, and people compare OCaml with Haskell pretty frequently.
Side note–in the OCaml standard library the time complexity of standard data structure algorithms is not documented (this is actually indirectly a point made by OP), so people can find it surprising because they had wrongly assumed that cardinal
is O(1). In that case another takeaway from this is that it would be a good idea to document the big-O for all relevant functions, eg Haskell https://hackage.haskell.org/package/containers-0.4.0.0/docs/Data-Map.html#v:size
baby (documentation) implements weight-balanced trees, following the paper Joinable Parallel Balanced Binary Trees by Blelloch, Ferizovic, and Sun (2022). The complexity of every operation is documented.
I have tested it quite heavily, and Wassel Bousmaha has verified its balancing code in Coq, so I have strong confidence that it is correct.
I have benchmarked it quite heavily. I find that it is usually slightly faster (and sometimes significantly faster) than OCaml's Set
library.
For the moment, it implements just the Set
API, not the Map
API.
I am still waiting for feedback from users.