river icon indicating copy to clipboard operation
river copied to clipboard

IndexError: list index out of range with DBSTREAM when updating the metric (Silhouette)

Open qetdr opened this issue 1 year ago • 6 comments

Versions

river version: 0.15.0 Python version: 3.10.4 Operating system: macOS Ventura 13.2

Describe the bug

Getting an IndexError when running the DBSTREAM and trying to update the Silhouette score (does not seem to matter whether I changed the parameter values or not; the included example is with default parameter values).

Steps/code to reproduce

Example code:

import pandas as pd
from river.cluster import DBSTREAM
from river import stream
from river.metrics import Silhouette

# Import the data
s1 = pd.read_table('http://cs.uef.fi/sipu/datasets/s1.txt', 
                   sep = "\s+", 
                   names = ['x1', 'x2']).sample(5000, random_state = 42).reset_index(drop = True)

# Taking a random sample for a smaller batch of the data
n_samples = 500
df_first_batch = s1.sample(n_samples).reset_index(drop = True)

clusterer = DBSTREAM()
metric = Silhouette()

for x, _ in stream.iter_pandas(df_first_batch):
    clusterer = clusterer.learn_one(x)
    y_pred = clusterer.predict_one(x)
    metric = metric.update(x = x, 
                           y_pred = y_pred, 
                           centers = clusterer.centers)

The output:

---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
Cell In[1], line 21
     19 clusterer = clusterer.learn_one(x)
     20 y_pred = clusterer.predict_one(x)
---> 21 metric = metric.update(x = x, 
     22                        y_pred = y_pred, 
     23                        centers = clusterer.centers)

File /Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/river/metrics/silhouette.py:74, in Silhouette.update(self, x, y_pred, centers, sample_weight)
     71 distance_closest_centroid = math.sqrt(utils.math.minkowski_distance(centers[y_pred], x, 2))
     72 self._sum_distance_closest_centroid += distance_closest_centroid
---> 74 distance_second_closest_centroid = self._find_distance_second_closest_center(centers, x)
     75 self._sum_distance_second_closest_centroid += distance_second_closest_centroid
     77 return self

File /Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/river/metrics/silhouette.py:67, in Silhouette._find_distance_second_closest_center(centers, x)
     64 @staticmethod
     65 def _find_distance_second_closest_center(centers, x):
     66     distances = {i: math.sqrt(utils.math.minkowski_distance(centers[i], x, 2)) for i in centers}
---> 67     return sorted(distances.values())[-2]

IndexError: list index out of range

qetdr avatar Feb 19 '23 09:02 qetdr