python-cluster icon indicating copy to clipboard operation
python-cluster copied to clipboard

KMeansClustering no longer works with custom objects.

Open exhuma opened this issue 11 years ago • 6 comments

The following code gives an error:

from cluster import KMeansClustering
from os import urandom
from pprint import pprint
from random import randint


class ObjectWithMetadata(object):

    def __init__(self, value):
        self.value = value
        self.uid = urandom(10).encode('base64').strip()

    def __repr__(self):
        return 'ObjectWithMetadata({!r}, {!r}'.format(self.value, self.uid)


data = [ObjectWithMetadata(randint(0, 1000))
        for _ in range(200)]

cl = KMeansClustering(data, lambda x, y: float(abs(x.value-y.value)))
clustered = cl.getclusters(10)
pprint(clustered)
print(len(clustered))

The error:

Traceback (most recent call last):
  File "metadata.py", line 21, in <module>
    clustered = cl.getclusters(10)
  File "/home/exhuma/work/github/python-cluster/cluster/method/kmeans.py", line 109, in getclusters
    res = self.assign_item(item, cluster)
  File "/home/exhuma/work/github/python-cluster/cluster/method/kmeans.py", line 124, in assign_item
    if self.distance(item, centroid(cluster)) < self.distance(
  File "/home/exhuma/work/github/python-cluster/cluster/util.py", line 175, in centroid
    for i in range(len(data[0])):
TypeError: object of type 'ObjectWithMetadata' has no len()

exhuma avatar Aug 22 '14 07:08 exhuma

With the current implementation of KMeansClustering, solving this would get quite messy. Running the K-Means method requires 3 functions which strongly depend on the nature of items:

  • distance
  • equality
  • centroid

It would make more sense to enforce non-tuple data elements to be a subclass of an ABC which has the above methods as abstract. Where distance could be implemented as __sub__ and equality could be implemented as __eq__.

I will follow semantic versioning, and as this would change the external API, I will postpone this for 2.0

exhuma avatar Aug 24 '14 10:08 exhuma

I believe HierarchicalClustering also does not work with custom objects e.g. I have following code:

cl = HierarchicalClustering(salary_head_probables_list, lambda x,y: _find_squared_distance(
        x['poly_center'], y['poly_center']))

where, salary_head_probables_list is a list of custom object _find_squared_distance returns of type float

I am getting this error trace:

File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/cluster/method/hierarchical.py", line 79, in __init__
    BaseClusterMethod.__init__(self, sorted(data), distance_function)
TypeError: unorderable types: dict() < dict()

anupamme avatar May 10 '17 16:05 anupamme

Thanks for raising this. I have to admit that this package has fallen a bit off my radar since I moved jobs a couple of years ago. I'll try to find some time to work on this. The package certainly could do with some love again... :)

exhuma avatar May 10 '17 20:05 exhuma

Sure, is there a quick hacky fix for this issue? which I can do and get unblocked.

anupamme avatar May 10 '17 20:05 anupamme

Sorry for the late reply... I wrote the answer above right before hitting the sack. I'll see if I can find something.

exhuma avatar May 11 '17 07:05 exhuma

@anupamme I've looked at your problem, and it's not related to this issue (#15). I've opened a new one (see #23) for your issue.

exhuma avatar May 11 '17 19:05 exhuma