faiss
faiss copied to clipboard
How to change nlist value dynamically with incremental update of index?
We are loading our data in chunks and periodically updating faiss index as we can't load all the data at once to train. So we're passing the nlist value dynamically and adding records and training our index. But I doubt that after all the chunks, nlist value only depends on the last chunk size, not on all records(faiss.ntotal). Is this the correct way to do that? Running on: CPU
labels: help wanted
Interface: Python
n = emb.shape[0]
nlist = 4 * (math.sqrt(n))
while emb:
if not i:
quantizer = faiss.IndexFlatL2(model_size)
faiss_index = faiss.IndexIVFFlat(quantizer, model_size, nlist, faiss.METRIC_L2)
faiss_index.cp.min_points_per_centroid = 5
faiss_index.nprobe = 4
else:
faiss.nlist = nlist
faiss_index.train(emb) # train on the database vectors
print(faiss_index.ntotal)
faiss_index.add(emb) # add the vectors and update the index
print(faiss_index.ntotal)
return faiss_index
cc Vikasdubey0551 mdouze Could you please help me with this issue, Thanks.
I don't understand.
Do I need to train the index after each incremental update? We update our indexes time to time. If yes, how can I set nlist for each update of index?
You can't retrain an index after adding vectors to it.
https://github.com/facebookresearch/faiss/wiki/FAQ#is-re-training-an-index-supported
So you mean I need to train an index on the sample and save it? Then add new data. I can't train on the whole dataset as it's huge and regularly update. Can I create multiple indexes, train them and merge them into a single index?
Retraining is only useful if there is a shift in the data distribution. Otherwise you can just add to the same trained index. NB that you cannot merge two indexes that are trained differently.