rrcf icon indicating copy to clipboard operation
rrcf copied to clipboard

Dealing with data-stream of constant values during a certain period

Open shfa5275 opened this issue 3 years ago • 1 comments

In certain cases, a stream may continue to get constant values for a while. Sometimes, in this case xmin=xmax resulting in l=nan, thereby leading to an exception in the following code:

def _cut(self, X, S, parent=None, side='l'): # Find max and min over all d dimensions xmax = X[S].max(axis=0) xmin = X[S].min(axis=0)

    # Compute l
    l = xmax - xmin
    l /= l.sum()

Any suggestions to deal with this "special case" gracefully!

shfa5275 avatar Nov 08 '20 14:11 shfa5275

I do not think the algorithm is well-defined for the case where all points are exactly identical, because you cannot partition the point set.

https://klabum.github.io/rrcf/tree-construction.html

In this case, you would essentially skip the tree construction algorithm and create a root node that is also a leaf that contains all the points in the set.

mdbartos avatar Nov 08 '20 20:11 mdbartos