kmeans_pytorch icon indicating copy to clipboard operation
kmeans_pytorch copied to clipboard

Error trying to cluster from numpy

Open carl-offerfit opened this issue 2 years ago • 1 comments

Hi, I'm not really using pytorch, but I want to use balanced kmeans. My code is as follows:

from torch import from_numpy
from balanced_kmeans import kmeans_equal
...
  # load X, a 23000x59 ndarray
  n_cluster = 50
  X_tensor = from_numpy(X)
  choices, centers = kmeans_equal(X_tensor,
                                  num_clusters=n_cluster,
                                  cluster_size=X.shape[0] // n_cluster)

I get the following error: RuntimeError: expand(torch.LongTensor{[59]}, size=[]): the number of sizes provided (0) must be greater or equal to the number of dimensions in the tensor (1)

Am I doing something wrong creating my tensor from numpy? I apologize because I am asking more of like a general pytorch question and not really specific to kmeans_pytorch (and tbh I'm a total pytorch newb!) Is there an example anywhere of using kmeans_equal on numpy data? I bet other people would find that useful. Thanks in advance for any tips you can provide!

carl-offerfit avatar Jun 28 '23 00:06 carl-offerfit

I got a little farther by adding a batch dimension to my data since that seems to be expected by kmeans_equal. So now I use:

X_tensor = torch.reshape(torch.from_numpy(X), (1,X.shape[0], X.shape[1]))

But now I get this error:

 line 165, in kmeans_equal
    selected_ind = torch.argsort(cluster_positions, dim=-1)[:, :cluster_size]
IndexError: too many indices for tensor of dimension 1

Process finished with exit code 1

carl-offerfit avatar Jun 29 '23 03:06 carl-offerfit