chroma icon indicating copy to clipboard operation
chroma copied to clipboard

[Bug]: peek() causes warning "Delete of nonexisting embedding ID"

Open andrewshvv opened this issue 2 years ago • 5 comments

What happened?

I have tried to remove the ids from the index which are non-existent, after that every peek() operation causes the warning Delete of nonexisting embedding ID. @HammadB mentioned warnings can be ignored, but nevertheless peek() shouldn't cause them. Relative discussion on Discord.

Here is chroma.zip for reproduction.

client = chromadb.PersistentClient(path=db_path)
posts = client.get_or_create_collection(
    name="posts",
    metadata={
        "hnsw:space": "cosine",
        "hnsw:M": 16,
        "hnsw:construction_ef": 200,
    }
)

Versions

Initially, I used chromadb==0.4.2, but before creating issues I switched on chromadb==0.4.5 to see if I see the same warnings, same result - I see warnings.

python = "^3.9.17"

Relevant log output

self._post_collection.peek(limit=0)["ids"]
PyDev console: starting.

2023-08-11 21:59:25 [T:MainThread] WARNING:chromadb.segment.impl.vector.brute_force_index: Delete of nonexisting embedding ID: 29
2023-08-11 21:59:25 [T:MainThread] WARNING:chromadb.segment.impl.vector.brute_force_index: Delete of nonexisting embedding ID: 31
2023-08-11 21:59:25 [T:MainThread] WARNING:chromadb.segment.impl.vector.brute_force_index: Delete of nonexisting embedding ID: 32
2023-08-11 21:59:25 [T:MainThread] WARNING:chromadb.segment.impl.vector.brute_force_index: Delete of nonexisting embedding ID: 33
2023-08-11 21:59:25 [T:MainThread] WARNING:chromadb.segment.impl.vector.brute_force_index: Delete of nonexisting embedding ID: 34

andrewshvv avatar Aug 11 '23 19:08 andrewshvv

Has been able to reproduce it, for some reason it doesn't happen right after delete, but only after restart. For the sake of info, I am actually doing stop(), when the program stops in my actual code.

client = chromadb.PersistentClient(path="test")
try:
    client.delete_collection(name="test_collection")
except ValueError:
    pass

collection = client.get_or_create_collection(
    "test_collection",
    metadata={"hnsw:space": "cosine"}
)

collection.add(
    embeddings=[[1, 2, 3]],
    ids=["1"]
)
collection.delete(ids=["3", "4", "5"])


client.stop() <=== improvised restart
client = chromadb.PersistentClient(path="test")
collection = client.get_or_create_collection(
    "test_collection",
    metadata={"hnsw:space": "cosine"}
)
print("peek")
collection.peek()["ids"]

andrewshvv avatar Aug 11 '23 19:08 andrewshvv

While trying to replicate the bug I encountered another interesting behavior, let me know if I need to create an issue for it.

client = chromadb.PersistentClient(path="test")
try:
    client.delete_collection(name="test_collection")
except ValueError:
    pass

collection = client.get_or_create_collection(
    "test_collection",
    metadata={"hnsw:space": "cosine"}
)

collection.delete(ids=["3", "4", "5"])


Delete of nonexisting embedding ID: 3
--- Logging error ---
Traceback (most recent call last):
  File "/Users/andrey/Library/Caches/pypoetry/virtualenvs/jobsearch-itTcmVTs-py3.9/lib/python3.9/site-packages/chromadb/db/mixins/embeddings_queue.py", line 263, in _notify_one
    sub.callback([embedding])
  File "/Users/andrey/Library/Caches/pypoetry/virtualenvs/jobsearch-itTcmVTs-py3.9/lib/python3.9/site-packages/chromadb/segment/impl/vector/local_persistent_hnsw.py", line 219, in _write_records
    ) is not None or self._brute_force_index.has_id(id)
AttributeError: 'NoneType' object has no attribute 'has_id'

andrewshvv avatar Aug 11 '23 19:08 andrewshvv

Yeah can you please file another bug with just that minimal repro, thanks. Will patch

HammadB avatar Aug 11 '23 19:08 HammadB

I had the same problem, after terminating the embedding generation while debugging. The "solution" was to delete the index once and the problem never appears again...

ttww avatar Apr 04 '24 07:04 ttww