chroma icon indicating copy to clipboard operation
chroma copied to clipboard

[Bug]: When I repeatedly call the similarity_search_with_score method, there is a noticeable increase in memory usage in a short period of time.

Open Zengfanxu1111 opened this issue 1 year ago • 3 comments

What happened?

When I repeatedly call the similarity_search_with_score method, there is a noticeable increase in memory usage in a short period of time.

Versions

openai==0.27.8 langchain==0.0.270 chromadb==0.4.6 flask==2.3.2 transformers==4.30.2 sentence-transformers==2.2.2 pandas==2.0.1 fire==0.5.0 loguru==0.7.0 FlagEmbedding==1.1.2 cachetools==5.3.1 gradio==3.50.0

Relevant log output

search_list = search_manager.search(company, 100, threshold)

The implementation method of search_manager is as follows"
#
    def search(self, item, k, threshold=0):
            _, query = self._create_doc(item)
            if self.debug:
                print(query)
            search_docs = self.vectorstore.similarity_search_with_score(query, k)
            result = []
            for doc, score in search_docs:
                metadata = doc.metadata
                metadata['score'] = 1 - score
                result.append(metadata)
            return result
#
The size of the 'company' is a little large,so the scale of computation is large too.
As you can see,when k=100, there is a significant increase in memory usage. I would like to know if chromadb has a mechanism for caching in the short term.

Zengfanxu1111 avatar Nov 10 '23 07:11 Zengfanxu1111

@ Zengfanxu1111, this is expected as Chroma loads collections into memory whenever they are accessed for the first time. Depending on the amount of vectors you have in the DB, the memory increase can be significant. The usual formula for calculating memory requirements is = 4 bytes * <number of vectors> * <dimensionality of vectors> + some overhead for Python process and metadata queries.

Can you confirm that this memory increases only the first time you try to search a collection after the application restarts?

tazarov avatar Nov 15 '23 13:11 tazarov

hi, same problem, memory consumption grows, after it reach limit chroma stops (process not die, so docker restart not work) i have one collection with 2k 3072 dim vectors, queries comes with 1-2rps (k=5 for each) image

imsamurai avatar Feb 19 '24 16:02 imsamurai

any news?

imsamurai avatar Apr 16 '24 08:04 imsamurai