chroma icon indicating copy to clipboard operation
chroma copied to clipboard

Size of database cannot grow larger than available RAM?

Open Tylersuard opened this issue 1 year ago • 8 comments

What happened?

I just wanted to be clear. If I have only 16gb of ram on my system, that means that my ChromaDB cannot be larger than 16gb? Are there any ways around that? I am working with a dataset that will likely be 1TB when embedded.

Versions

Chroma 0.4

Relevant log output

No response

Tylersuard avatar Nov 02 '23 15:11 Tylersuard

@Tylersuard, there are couple of pieces of data that Chroma stores:

  • Vectors - this gets loaded in memory and the formula for calculating memory requirements is: 4 bytes * num vectors * vector dimensionality
  • Metadata - these are your documents and metadata, stored in SQLite DB and dynamically loaded in memory as needed (depending on your queries)

As of today, Chroma loads all accessed collections into memory and never unloads them. We are considering ways to allow users to unload collections to save memory and thus decouple DB size from memory size.

tazarov avatar Nov 02 '23 16:11 tazarov

@tazarov Thank you for the answer! I am wondering if the situation has been changed and whether there are any updates.

puyuanOT avatar Dec 14 '23 18:12 puyuanOT

@puyuanOT, I've create a small PR that implemented manual unloading, but it was actually going to cause more problems for devs than it solves if we allow the manual unloading of collections from the API. We're considering the best approach for this that will not invalidate some of the memory assumptions of Chroma for both single-node and distributed.

Your input is valuable, so please also consider your requirements for this feature and share them here.

tazarov avatar Dec 14 '23 18:12 tazarov

Any updates on unloading unecessary collection from memory. I have a db collection of 4gb while sqlite file is 14gb @tazarov

ML-Abdula avatar Apr 09 '24 09:04 ML-Abdula

any update @tazarov ?

baseplate77 avatar May 30 '24 10:05 baseplate77

I am also troubled by this issue. Do you have any updates on unloading unecessary collection from memory?

yuanpeizhou avatar Jun 15 '24 10:06 yuanpeizhou

@baseplate77, @yuanpeizhou, we implemented an LRU strategy that can unload collections that are not frequently used (assuming you have more than one collection). The functionality has been documented here https://cookbook.chromadb.dev/strategies/memory-management/#lru-cache-strategy

tazarov avatar Aug 06 '24 05:08 tazarov

@tazarov I could not get the LRU strategy to work despite following the above instructions. I'm running the latest Chroma 0.5.7 inside the official Docker image chromadb/chroma. I'm setting the following env variables on the Docker container:

CHROMA_SEGMENT_CACHE_POLICY="LRU"
CHROMA_MEMORY_LIMIT_BYTES="2500000000"  # ~2.5GB

When I keep inserting vectors into a Chroma database (empty in the beginning), the memory usage happily flies past the set limit until the whole thing crashes due to running out of memory:

Screenshot 2024-09-20 at 18 35 34

hpihkala avatar Sep 20 '24 16:09 hpihkala