faiss icon indicating copy to clipboard operation
faiss copied to clipboard

Creating an array of faiss.Kmeans objects that uses gpu

Open FalsoMoralista opened this issue 8 months ago • 6 comments

Context: I have a dataset with say 1.000 classes and I want to perform K-means with gpu over each class. E.g. (sketch):


class KMeansModule:
    
    def __init__(self, nb_classes, dimensionality=256, n_iter=10, k=5, max_iter=300):

        self.k = k
        self.d = dimensionality
        self.n_iter = n_iter
        self.n_kmeans = [faiss.Kmeans(d=dimensionality, k=k, niter=1, gpu=True, verbose=True) for _ in nb_classes]         


    def assign(self, x_i, y_i):
        # Train K-means model for one iteration to initialize centroids
        self.n_kmeans[y_i].train(x_i)
        # Assign vectors to the nearest cluster centroid
        D, I = self.n_kmeans[y_i].index.search(x_i, 1)
        return D, I

But then I came across into this which states: "All GPU indexes are built with a StandardGpuResources object (which is an implementation of the abstract class GpuResources). The resource object contains needed resources for each GPU in use, including an allocation of temporary scratch space (by default, about 2 GB on a 12 GB GPU), cuBLAS handles and CUDA streams."

Therefore I was worried about running into memory issues considering that I will have one kmeans object for each class. Is there anyway to modify those settings? Should I be worried at all?

Suggestions are appreciated. Thanks in advance.

FalsoMoralista avatar May 31 '24 21:05 FalsoMoralista