chroma
chroma copied to clipboard
[Bug]: peek() causes warning "Delete of nonexisting embedding ID"
What happened?
I have tried to remove the ids from the index which are non-existent, after that every peek() operation causes the warning Delete of nonexisting embedding ID. @HammadB mentioned warnings can be ignored, but nevertheless peek() shouldn't cause them. Relative discussion on Discord.
Here is chroma.zip for reproduction.
client = chromadb.PersistentClient(path=db_path)
posts = client.get_or_create_collection(
name="posts",
metadata={
"hnsw:space": "cosine",
"hnsw:M": 16,
"hnsw:construction_ef": 200,
}
)
Versions
Initially, I used chromadb==0.4.2, but before creating issues I switched on chromadb==0.4.5 to see if I see the same warnings, same result - I see warnings.
python = "^3.9.17"
Relevant log output
self._post_collection.peek(limit=0)["ids"]
PyDev console: starting.
2023-08-11 21:59:25 [T:MainThread] WARNING:chromadb.segment.impl.vector.brute_force_index: Delete of nonexisting embedding ID: 29
2023-08-11 21:59:25 [T:MainThread] WARNING:chromadb.segment.impl.vector.brute_force_index: Delete of nonexisting embedding ID: 31
2023-08-11 21:59:25 [T:MainThread] WARNING:chromadb.segment.impl.vector.brute_force_index: Delete of nonexisting embedding ID: 32
2023-08-11 21:59:25 [T:MainThread] WARNING:chromadb.segment.impl.vector.brute_force_index: Delete of nonexisting embedding ID: 33
2023-08-11 21:59:25 [T:MainThread] WARNING:chromadb.segment.impl.vector.brute_force_index: Delete of nonexisting embedding ID: 34
Has been able to reproduce it, for some reason it doesn't happen right after delete, but only after restart. For the sake of info, I am actually doing stop(), when the program stops in my actual code.
client = chromadb.PersistentClient(path="test")
try:
client.delete_collection(name="test_collection")
except ValueError:
pass
collection = client.get_or_create_collection(
"test_collection",
metadata={"hnsw:space": "cosine"}
)
collection.add(
embeddings=[[1, 2, 3]],
ids=["1"]
)
collection.delete(ids=["3", "4", "5"])
client.stop() <=== improvised restart
client = chromadb.PersistentClient(path="test")
collection = client.get_or_create_collection(
"test_collection",
metadata={"hnsw:space": "cosine"}
)
print("peek")
collection.peek()["ids"]
While trying to replicate the bug I encountered another interesting behavior, let me know if I need to create an issue for it.
client = chromadb.PersistentClient(path="test")
try:
client.delete_collection(name="test_collection")
except ValueError:
pass
collection = client.get_or_create_collection(
"test_collection",
metadata={"hnsw:space": "cosine"}
)
collection.delete(ids=["3", "4", "5"])
Delete of nonexisting embedding ID: 3
--- Logging error ---
Traceback (most recent call last):
File "/Users/andrey/Library/Caches/pypoetry/virtualenvs/jobsearch-itTcmVTs-py3.9/lib/python3.9/site-packages/chromadb/db/mixins/embeddings_queue.py", line 263, in _notify_one
sub.callback([embedding])
File "/Users/andrey/Library/Caches/pypoetry/virtualenvs/jobsearch-itTcmVTs-py3.9/lib/python3.9/site-packages/chromadb/segment/impl/vector/local_persistent_hnsw.py", line 219, in _write_records
) is not None or self._brute_force_index.has_id(id)
AttributeError: 'NoneType' object has no attribute 'has_id'
Yeah can you please file another bug with just that minimal repro, thanks. Will patch
I had the same problem, after terminating the embedding generation while debugging. The "solution" was to delete the index once and the problem never appears again...