faiss icon indicating copy to clipboard operation
faiss copied to clipboard

reconstruct_n() on MacOS causes system error

Open aabor opened this issue 11 months ago • 3 comments

Summary

I am unable to use reconstructed index as a numpy array. The reconstruction itself succeeds, but then when I try to create new index it fails with system error message.

Reproduction instructions

    import faiss
    import numpy as np

    d = 768
    ncentroids = 15
    niter = 2

    faiss_index = faiss.IndexFlatL2(d)
    x0 = np.random.random( (5000, d) )
    faiss_index.add(x0)
    x = faiss_index.reconstruct_n()
    kmeans_index = faiss.Kmeans(d, ncentroids, niter=niter, verbose=True)
    kmeans_index.train(x)

Error message Sampling a subset of 3840 / 5000 for training Clustering 3840 points in 768D to 15 clusters, redo 1 times, 2 iterations Preprocessing in 0.01 s

Process finished with exit code 139 (interrupted by signal 11: SIGSEGV)

faiss-cpu == 1.8.0 Python 3.11.6 (v3.11.6:8b6ee5ba3b, Oct 2 2023, 11:18:21) [Clang 13.0.0 (clang-1300.0.29.30)] on darwin

Platform

Model Name: MacBook Pro Model Identifier: Mac14,9 Model Number: MPHE3LL/A Chip: Apple M2 Pro Total Number of Cores: 10 (6 performance and 4 efficiency) Memory: 16 GB System Firmware Version: 10151.81.1 OS Loader Version: 10151.81.1

aabor avatar Mar 06 '24 21:03 aabor

I also found out that faiss fails to work after execution of the following code:

import nest_asyncio
nest_asyncio.apply()

Some problem with async applications.

aabor avatar Mar 07 '24 00:03 aabor

I can't repro on 1.7.4, probably an installation error. Please fill in the issue template to show how Faiss was installed.

mdouze avatar Mar 15 '24 08:03 mdouze

I will try to explain what happens more deeply.

if I run the following code:

from llama_index.vector_stores.faiss import FaissVectorStore
import faiss

faiss_index = faiss.IndexFlatL2(vector_length)
storage_context = StorageContext.from_defaults(
    vector_store=FaissVectorStore(faiss_index=faiss_index))
storage_context.docstore.add_documents(nodes)

and then I will call somewhere else

    import faiss
    import numpy as np

    d = 768
    ncentroids = 15
    niter = 2

    faiss_index = faiss.IndexFlatL2(d)
    x0 = np.random.random( (5000, d) )
    faiss_index.add(x0)
    x = faiss_index.reconstruct_n()
    kmeans_index = faiss.Kmeans(d, ncentroids, niter=niter, verbose=True)
    kmeans_index.train(x)

It fails.

I managed to resolve the issue by importing original faiss library first. The below code will work.

# !!!! don't delete, need to import here
import faiss

if __name__ == '__main__':
   # run all functionality in any order here

It looks like the issue with faiss memory management.

One should import faiss package first, then other packages that depend on it:

import faiss
from llama_index.vector_stores.faiss import FaissVectorStore

So, switch the order, like above.

aabor avatar Mar 15 '24 16:03 aabor

I am unable to reproduce the exit code 139. All of my attempts have resulted in exit code 0. I suggest you file the issue with https://github.com/run-llama/llama_index if the issue persists for you. Otherwise, feel free to file a new issue with fully reproducible code and install commands.

asadoughi avatar Jul 26 '24 17:07 asadoughi