KMeansClustering no longer works with custom objects.
The following code gives an error:
from cluster import KMeansClustering
from os import urandom
from pprint import pprint
from random import randint
class ObjectWithMetadata(object):
def __init__(self, value):
self.value = value
self.uid = urandom(10).encode('base64').strip()
def __repr__(self):
return 'ObjectWithMetadata({!r}, {!r}'.format(self.value, self.uid)
data = [ObjectWithMetadata(randint(0, 1000))
for _ in range(200)]
cl = KMeansClustering(data, lambda x, y: float(abs(x.value-y.value)))
clustered = cl.getclusters(10)
pprint(clustered)
print(len(clustered))
The error:
Traceback (most recent call last):
File "metadata.py", line 21, in <module>
clustered = cl.getclusters(10)
File "/home/exhuma/work/github/python-cluster/cluster/method/kmeans.py", line 109, in getclusters
res = self.assign_item(item, cluster)
File "/home/exhuma/work/github/python-cluster/cluster/method/kmeans.py", line 124, in assign_item
if self.distance(item, centroid(cluster)) < self.distance(
File "/home/exhuma/work/github/python-cluster/cluster/util.py", line 175, in centroid
for i in range(len(data[0])):
TypeError: object of type 'ObjectWithMetadata' has no len()
With the current implementation of KMeansClustering, solving this would get quite messy. Running the K-Means method requires 3 functions which strongly depend on the nature of items:
- distance
- equality
- centroid
It would make more sense to enforce non-tuple data elements to be a subclass of an ABC which has the above methods as abstract. Where distance could be implemented as __sub__ and equality could be implemented as __eq__.
I will follow semantic versioning, and as this would change the external API, I will postpone this for 2.0
I believe HierarchicalClustering also does not work with custom objects e.g. I have following code:
cl = HierarchicalClustering(salary_head_probables_list, lambda x,y: _find_squared_distance(
x['poly_center'], y['poly_center']))
where,
salary_head_probables_list is a list of custom object
_find_squared_distance returns of type float
I am getting this error trace:
File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/cluster/method/hierarchical.py", line 79, in __init__
BaseClusterMethod.__init__(self, sorted(data), distance_function)
TypeError: unorderable types: dict() < dict()
Thanks for raising this. I have to admit that this package has fallen a bit off my radar since I moved jobs a couple of years ago. I'll try to find some time to work on this. The package certainly could do with some love again... :)
Sure, is there a quick hacky fix for this issue? which I can do and get unblocked.
Sorry for the late reply... I wrote the answer above right before hitting the sack. I'll see if I can find something.
@anupamme I've looked at your problem, and it's not related to this issue (#15). I've opened a new one (see #23) for your issue.