ocaml.org icon indicating copy to clipboard operation
ocaml.org copied to clipboard

Documentation for `Hashtbl.hash` does not mention the default meaningful nodes limit

Open LaifsV1 opened this issue 8 months ago • 3 comments

The documentation for Hashtbl.hash currently reads:

Hashtbl.hash x associates a nonnegative integer to any value of any type. It is guaranteed that if x = y or Stdlib.compare x y = 0, then hash x = hash y. Moreover, hash always terminates, even on cyclic structures.

If I'm not mistaken, this is not documenting the limit on meaningful nodes traversed. Currently one has to look at books like Real World OCaml to find that the bound is set to 10:

OCaml’s polymorphic hash function works by walking over the data structure it’s given using a breadth-first traversal that is bounded in the number of nodes it’s willing to traverse. By default, that bound is set at 10 “meaningful” nodes.

i.e. hash of a list will ignore everything after the first 10 elements.

Without this bound documented, one may be confused by why every list with the same 10-element prefix collides. Hiding this also makes performance of Hashtables a mystery. It's unclear from reading the docs whether Hashtables require data to be optimised to be less than 10 nodes deep.

LaifsV1 avatar Feb 11 '25 18:02 LaifsV1