chroma icon indicating copy to clipboard operation
chroma copied to clipboard

[Bug]: Size of sqlite3 file not reducing after deleting files

Open XariZaru opened this issue 7 months ago • 1 comments

What happened?

After deleting all the documents I added, the size of the sqlite3 file is still the same. The size definitely increments after adding more... but I have to add more documents than I did previously. Note, try running the code a few times to generate at least some file size first. This is my code:

import os
import asyncio
from langchain_core.embeddings import Embeddings

class ChromaWrapper(Chroma):
    def __init__(self, collection_name: str, embedding_function: Embeddings, persist_directory=None):
        super().__init__(collection_name=collection_name, embedding_function=embedding_function, persist_directory=persist_directory)

    def delete_by_list(self, sources: list[str]):
        assert len(sources) > 0
        
        ids = [doc.id for doc in vector_store.similarity_search(
            query="",
            k=45,
            filter={"source": {"$in": sources}},
        )]
        
        if len(ids) > 0:
            self.delete(ids=ids)
            
        return len(ids)

vector_store = ChromaWrapper(
    collection_name=os.environ["CHROMA_DB_NAME"],
    embedding_function=embedding_model,
    persist_directory=os.environ["CHROMA_DB_NAME"]
)

docs = asyncio.run(load_docs(["https://en.wikipedia.org/wiki/Cyrus_the_Great", "https://en.wikipedia.org/wiki/Iranian_peoples#Eastern_Iranian_peoples"]))
vector_store.add_documents(docs)

print("Size after adding documents", os.path.getsize("chroma_documents\\chroma.sqlite3"), "documents:", vector_store.get()["ids"])
vector_store.delete_by_list([doc.metadata["source"] for doc in docs])
print("Size after deleting documents", os.path.getsize("chroma_documents\\chroma.sqlite3"), "documents:", vector_store.get()["ids"])

Image

Versions

Python 3.13.0, chromadb 0.6.3

Relevant log output

Fetching pages: 100%|####################################################################| 2/2 [00:00<00:00,  5.46it/s]
Size after adding documents 8462336 documents: ['08523761-6094-426f-826e-d6a1fcf4a84e', '659d2d85-bf9f-4a03-9706-a372e42f4473']
Number of requested results 45 is greater than number of elements in index 2, updating n_results = 2
Size after deleting documents 8462336 documents: []

XariZaru avatar Mar 03 '25 02:03 XariZaru