autofaiss
autofaiss copied to clipboard
Automatically create Faiss knn indices with the most optimal similarity search parameters.
might also be done for the index itself would unlock training with more points and building larger indices with a lower memory
https://github.com/facebookresearch/faiss/blob/b8fe92dfee9ea6f9c8cae27e4fc3ffeb12b5c4d2/benchs/distributed_ondisk/README.md#distributed-k-means
Currently merging in distributed mode requires to store the whole index in memory Possible strategies: * improve faiss merge into to avoid putting everything in memory * producing N index...
using https://github.com/beyondstorage/setup-hdfs
same for the evaluation set currently we use the first N vectors for both training and evaluation which is not ideal, especially if the embedding set is not randomly shuffled
`TemporaryDirectory` is a local folder which may not have any room the user should specify what is the temporary folder (in fact we already have an option for this)
the strategy to create a few small indices the memory usage during adding and (if using the special merge on disk function) completely cap the memory used by autofaiss in...
just tried it and the new estimation at https://github.com/criteo/autofaiss/pull/81/files doesn't fully capture the memory needed for training when training an index such as `OPQ32_224,IVF131072_HNSW32,PQ32x8` faiss trains the index in 2...
it takes many minutes to run it
it would decrease significantly the 8 byte overhead of each item Storing 2^63 items in an index is not possible