chroma Indexing with 1M random vector is slow

# %%
import chromadb
client = chromadb.Client()

# %%
import numpy as np
from tqdm import tqdm

# %%
from chromadb.api.types import Documents, EmbeddingFunction, Embeddings

class MyEmbeddingFunction(EmbeddingFunction):
    def __call__(self, texts: Documents) -> Embeddings:
        # embed the documents somehow
        return np.random.random((len(texts), 512)).tolist()


# %%

collection = client.create_collection(name="clip_image_product", embedding_function=MyEmbeddingFunction())

# %%
collection.count()

# %%
# this takes > 5 hours
for chunk in tqdm(range(10000)):
    vectors = np.random.rand(100, 512)
    collection.add(
        documents=[f"This is a document id{idx+chunk*100}" for idx in range(100)],
        metadatas=[{"color": ["red", "yellow", "blue"][idx%3]} for idx in range(100)],
        ids=[f"id{idx+chunk*100}" for idx in range(100)],
        embeddings=vectors.tolist()
    )

# %%
collection.count()

# %%
# this takes 178 ms
hits = collection.query(
    query_embeddings=np.random.random((1,512)).tolist(),
    n_results=1000
)

so it takes >5 hours to index 1M random vectors, and >100ms to query top 1000 vectors from the 1M. Is this expected performance for chroma?

My use case has 5-20M vectors, and has QPS ~= 1000, P95 latency < 100ms, would chroma be the right tool for this scale?

Apr 12 '23 13:04 junwang-wish

@junwang-wish you will get far better performance loading batches of 100k+ vectors into chroma. Can you run your benchmark with that?

Apr 12 '23 16:04 jeffchuber

Also what is the size of your machine? 5-20M vectors will need a lot of RAM.

Apr 12 '23 16:04 HammadB

I have 1.5TB memory >200 CPU, will try loading batches of 100k+ vectors into chroma instead. Regardless, the 178ms query time worries me, since this is only 1 million vectors, scaling this up to larger set would make P95<100ms infeasible.

Apr 12 '23 16:04 junwang-wish

That is quite slow - I also see you are asking for 1000 results. Is that actually how many you need?

Query speed is quite coupled to n_results - try turning n_results down to 10 to 50 and see if you get faster results. Additionally, pass an index parameter that sets your num_threads to 200, the default is 4 and that is not going to properly utilize your machine.

Apr 12 '23 17:04 HammadB

I've opened #337 to address the underutilization.

Apr 12 '23 17:04 HammadB

Yeah 1000 is what we need

Apr 12 '23 17:04 junwang-wish

Yeah - can you try increasing the number of threads? That should help substantially.

Apr 12 '23 17:04 HammadB

@HammadB @jeffchuber I looked around for anything related to performance optimization/finetuning around chromadb and couldn't find. Could you please point me to any documentation, or provide a quick writeup ? My query: I want to load 10M rows into chromadb as fast as possible and it's unclear if .add method already auto-optimizes everything or not if used like getting-started doc for a local embeding compute. I want to understand more about:

Is Optimal max batch size for SentenceTransformers embeddings auto determined, so that batch embeddings can utilize the GraphicsCard memory to max or is it better to optimize it manually and pass embeddings explicitly to .add.
CPU / VRAM threads/process configuration to optimize for a specific machine .

And I have some doubts in thread above:

I see you mentioned 100k inserts would be faster. Could you please explain why? In the above case, the embeddings are precomputed, shouldn't .add be not dependent on batch size in this case
5-20M vectors would need a lot of VRAM. Are there any benchmarks already available which help us estimate the machine size needed?

I'd like to understand where the knobs are and how to set them properly for fastest possible insert / query depending on choice of embedding model + GPU & CPU + RAM.

May 07 '23 06:05 dk-crazydiv

@dk-crazydiv these are good questions - we are working on a performance section of our docs - the issue is here https://github.com/chroma-core/docs/issues/52

May 08 '23 17:05 jeffchuber

Same issue working with 1M documents, each doc is very small (< 512 chars). Using default embedding function, ChromaDB is taking over 18 hours!! When using Instructor-XL, there is no end in sight. I have A100 so I was hopping it can use GPU but for most models there is no way to specify device. Furthermore, I don't think ChromaDB is using batching to generate embeddings which would make it super slow even when device=cuda was specified. When using Instructor Embedding directly and even with their largest XL model on A100 with batch size 64, I am able to generate all embeddings for 1M docs under 8h. It appears ChromaDB isn't really yet ready for anything beyond toy sizes? I love its simplicity and easy to get started but any real world usage will involve >> 1M docs.

Jul 19 '23 11:07 sytelus

@sytelus For your use case I would suggest embedding outside of Chroma and then passing the embeddings in

the code is really quite minimal - https://github.com/chroma-core/chroma/blob/main/chromadb/utils/embedding_functions.py

Jul 21 '23 05:07 jeffchuber

I have 1.5TB memory >200 CPU, will try loading batches of 100k+ vectors into chroma instead. Regardless, the 178ms query time worries me, since this is only 1 million vectors, scaling this up to larger set would make P95<100ms infeasible.

Will the .add operation speed up with loading large batch embedding at one time?

Aug 24 '23 02:08 thtang

@thtang Chroma now supports batch inserts and it is much faster than before. If you can load things in batches, you will see a large perf improvement.

Aug 27 '23 22:08 jeffchuber

chroma chroma copied to clipboard

Indexing with 1M random vector is slow

chroma
chroma copied to clipboard