chroma icon indicating copy to clipboard operation
chroma copied to clipboard

[Bug]: Resource leak in `delete_collection` (Single-Node Chroma)

Open tazarov opened this issue 10 months ago • 0 comments

What happened?

The bug causes both memory and file handles to leak infinitely.

The bug is easy to reproduce:

import os

import chromadb
import numpy as np
import uuid
from chromadb.db.system import SysDB
from chromadb.segment import SegmentType
import psutil

client = chromadb.PersistentClient("delete_resource_leak")

col = client.get_or_create_collection("delete_resource_leak")

process = psutil.Process()
open_files = process.open_files()
print(open_files)
embeddings  = np.random.uniform(-1, 1, (100, 1536))
docs = [f"doc_{i}" for i in range(100)]
ids = [f"{uuid.uuid4()}" for i in range(100)]
col.add(
    embeddings=embeddings,
    ids=ids,
    documents=docs,
)

sysdb: SysDB = client._server._sysdb  # type: ignore
segments = sysdb.get_segments(collection=col.id)
assert len(segments) == 2
vector_segment = [s for s in segments if s["type"] == SegmentType.HNSW_LOCAL_PERSISTED.value][0]
assert os.path.exists(os.path.join("delete_resource_leak", str(vector_segment["id"])))
client.delete_collection(col.name)
open_files = process.open_files()
print(open_files)
# the below will fail
assert not os.path.exists(os.path.join("delete_resource_leak", str(vector_segment["id"])))

After deletion of a collection the HNSW segment dir and the related file handles are not released. The issue is a change introduced in 0.5.21. The below diagram sums up the issue:

image

Versions

Chroma version >0.5.20

Relevant log output

No response

tazarov avatar Dec 13 '24 16:12 tazarov