vptree icon indicating copy to clipboard operation
vptree copied to clipboard

Crash when creating a tree containing identical elements

Open mad-moo opened this issue 2 years ago • 0 comments

Creating a tree fails if the set of points contains about 500 or more points with a distance of 0:

import vptree

vptree.VPTree([1] * 500, lambda a, b: a-b)
[...]
  File "...\core\_methods.py", line 164, in _mean        
    arr = asanyarray(a)
RecursionError: maximum recursion depth exceeded while calling a Python object

During construction, VPTree attempts to recursively split the data into two equally size sets, "left" and "right". However, because all elements are identical, they are all added to the same set instead. Thus on each iteration the tree creates one empty set, and one with every single point. This large set only shrinks over time because each iteration removes one item which is to be used as vantage point. So instead of the set cleanly halving on each iteration, only a single datapoint is removed, leading to a degenerate, very deep tree. This in turn exhausts python's recursion limit.


A clean way to solve this would be for each node to not only store one vantage point, but a list of points which have 0 distance to the vantage point. Alternatively the tree could ensure the sets are approximately equal size, by moving some elements from the larger to the smaller.

I can contribute code, please just let me know which solution you'd prefer.

mad-moo avatar Sep 28 '22 19:09 mad-moo