river
river copied to clipboard
IndexError: list index out of range with DBSTREAM when updating the metric (Silhouette)
Versions
river version: 0.15.0 Python version: 3.10.4 Operating system: macOS Ventura 13.2
Describe the bug
Getting an IndexError when running the DBSTREAM and trying to update the Silhouette score (does not seem to matter whether I changed the parameter values or not; the included example is with default parameter values).
Steps/code to reproduce
Example code:
import pandas as pd
from river.cluster import DBSTREAM
from river import stream
from river.metrics import Silhouette
# Import the data
s1 = pd.read_table('http://cs.uef.fi/sipu/datasets/s1.txt',
sep = "\s+",
names = ['x1', 'x2']).sample(5000, random_state = 42).reset_index(drop = True)
# Taking a random sample for a smaller batch of the data
n_samples = 500
df_first_batch = s1.sample(n_samples).reset_index(drop = True)
clusterer = DBSTREAM()
metric = Silhouette()
for x, _ in stream.iter_pandas(df_first_batch):
clusterer = clusterer.learn_one(x)
y_pred = clusterer.predict_one(x)
metric = metric.update(x = x,
y_pred = y_pred,
centers = clusterer.centers)
The output:
---------------------------------------------------------------------------
IndexError Traceback (most recent call last)
Cell In[1], line 21
19 clusterer = clusterer.learn_one(x)
20 y_pred = clusterer.predict_one(x)
---> 21 metric = metric.update(x = x,
22 y_pred = y_pred,
23 centers = clusterer.centers)
File /Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/river/metrics/silhouette.py:74, in Silhouette.update(self, x, y_pred, centers, sample_weight)
71 distance_closest_centroid = math.sqrt(utils.math.minkowski_distance(centers[y_pred], x, 2))
72 self._sum_distance_closest_centroid += distance_closest_centroid
---> 74 distance_second_closest_centroid = self._find_distance_second_closest_center(centers, x)
75 self._sum_distance_second_closest_centroid += distance_second_closest_centroid
77 return self
File /Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/river/metrics/silhouette.py:67, in Silhouette._find_distance_second_closest_center(centers, x)
64 @staticmethod
65 def _find_distance_second_closest_center(centers, x):
66 distances = {i: math.sqrt(utils.math.minkowski_distance(centers[i], x, 2)) for i in centers}
---> 67 return sorted(distances.values())[-2]
IndexError: list index out of range