faiss
faiss copied to clipboard
How to modify (override) kmeans centroids and do inference
Summary
I'm a newbie with FAISS, but it works nicely. Maybe I'm not 100% into it yet, so sorry if my questions are silly.
I would like to train a k-mean, do inference, make some maths on the centroids and redo inference without retraining. Which is the best way to do it?
Reproduction instructions
Here's the importat part of my class:
class Clustering():`
def __init__(self, nb_class, start_centr, nb_iter, threshold, gpu=True):
self.nb_class=nb_class
self.centroids=start_centr
self.iter=nb_iter
self.nb_feat = 16
self.kmeans = faiss.Kmeans(self.nb_feat, self.nb_class, niter=self.iter, gpu=self.gpu)
self.kmeans.seed = np.random.randint(1234)
self.kmeans.centroids = self.centroids
def train_kmean(self, array, nb_feat):
self.nb_feat=nb_feat
#this function prepares the data for FAISS - not reported
array = self.reshape_array_for_faiss(array)
self.kmeans.train(array,init_centroids=self.centroids)
_, I = self.kmeans.index.search(array, 1)
self.centroids = copy.deepcopy(self.kmeans.centroids)
loss = self.kmeans.obj[-1]
#this function prepares the data for the rest of the code - not reported
I = self.reshape_array_for_keras(I)
return I, loss
def val_kmean(self,array):
array = self.reshape_array_for_faiss(array)
centroids2
loss = self.kmeans.obj[-1]
I = self.reshape_array_for_keras(I)
return I, loss
Then in the main code:
[...]
self.clustering = Clustering(2, self.centroids, 30, 10000, gpu=True)
data_kmeans, kloss = self.clustering.train_kmean(data, data.shape[-1])
new_data_kmeans1, _ = self.clustering.val_kmean(new_data)
centroids1 = getattr(self.clustering.kmeans,'centroids')
[...doing maths on centroids1 to obtain centroids2...]
setattr(self.clustering.kmeans,'centroids',centroids2)
new_data_kmeans2, _ = self.clustering.val_kmean(new_data)
However looking at #1940 and https://gist.github.com/mdouze/9eb96d941c94ef59482a069e5862a650 I have the impression that I do not really update the index.
Should be something like this?
self.clustering.kmeans.index.add(centroids2)
new_data_kmeans2, _ = self.clustering.val_kmean(new_data)
but how to override the old centroids?
Running on:
- [ ] CPU
- [x] GPU
Interface:
- [ ] C++
- [x] Python
What do you mean with "inference" ? I assume it means searching in the set of centroids. In that case, you can just use the regular knn function on the new centroids, see https://github.com/facebookresearch/faiss/wiki/Brute-force-search-without-an-index
Thanks for the reply With inference I mean to use the same centroids to classify other data.
I will look into the page, but meanwhile I figured as a solution to reset the kmean.index and add the centroids, then search in it
self.clustering.kmeans.index.reset()
self.clustering.kmeans.index.add(centroids2)
new_data_kmeans2, _ = self.clustering.val_kmean(new_data)
Is there any important difference between the two methods?