chroma
chroma copied to clipboard
[Bug]: Resource leak in `delete_collection` (Single-Node Chroma)
What happened?
The bug causes both memory and file handles to leak infinitely.
The bug is easy to reproduce:
import os
import chromadb
import numpy as np
import uuid
from chromadb.db.system import SysDB
from chromadb.segment import SegmentType
import psutil
client = chromadb.PersistentClient("delete_resource_leak")
col = client.get_or_create_collection("delete_resource_leak")
process = psutil.Process()
open_files = process.open_files()
print(open_files)
embeddings = np.random.uniform(-1, 1, (100, 1536))
docs = [f"doc_{i}" for i in range(100)]
ids = [f"{uuid.uuid4()}" for i in range(100)]
col.add(
embeddings=embeddings,
ids=ids,
documents=docs,
)
sysdb: SysDB = client._server._sysdb # type: ignore
segments = sysdb.get_segments(collection=col.id)
assert len(segments) == 2
vector_segment = [s for s in segments if s["type"] == SegmentType.HNSW_LOCAL_PERSISTED.value][0]
assert os.path.exists(os.path.join("delete_resource_leak", str(vector_segment["id"])))
client.delete_collection(col.name)
open_files = process.open_files()
print(open_files)
# the below will fail
assert not os.path.exists(os.path.join("delete_resource_leak", str(vector_segment["id"])))
After deletion of a collection the HNSW segment dir and the related file handles are not released. The issue is a change introduced in 0.5.21. The below diagram sums up the issue:
Versions
Chroma version >0.5.20
Relevant log output
No response