chroma
chroma copied to clipboard
[Bug]: Size of sqlite3 file not reducing after deleting files
What happened?
After deleting all the documents I added, the size of the sqlite3 file is still the same. The size definitely increments after adding more... but I have to add more documents than I did previously. Note, try running the code a few times to generate at least some file size first. This is my code:
import os
import asyncio
from langchain_core.embeddings import Embeddings
class ChromaWrapper(Chroma):
def __init__(self, collection_name: str, embedding_function: Embeddings, persist_directory=None):
super().__init__(collection_name=collection_name, embedding_function=embedding_function, persist_directory=persist_directory)
def delete_by_list(self, sources: list[str]):
assert len(sources) > 0
ids = [doc.id for doc in vector_store.similarity_search(
query="",
k=45,
filter={"source": {"$in": sources}},
)]
if len(ids) > 0:
self.delete(ids=ids)
return len(ids)
vector_store = ChromaWrapper(
collection_name=os.environ["CHROMA_DB_NAME"],
embedding_function=embedding_model,
persist_directory=os.environ["CHROMA_DB_NAME"]
)
docs = asyncio.run(load_docs(["https://en.wikipedia.org/wiki/Cyrus_the_Great", "https://en.wikipedia.org/wiki/Iranian_peoples#Eastern_Iranian_peoples"]))
vector_store.add_documents(docs)
print("Size after adding documents", os.path.getsize("chroma_documents\\chroma.sqlite3"), "documents:", vector_store.get()["ids"])
vector_store.delete_by_list([doc.metadata["source"] for doc in docs])
print("Size after deleting documents", os.path.getsize("chroma_documents\\chroma.sqlite3"), "documents:", vector_store.get()["ids"])
Versions
Python 3.13.0, chromadb 0.6.3
Relevant log output
Fetching pages: 100%|####################################################################| 2/2 [00:00<00:00, 5.46it/s]
Size after adding documents 8462336 documents: ['08523761-6094-426f-826e-d6a1fcf4a84e', '659d2d85-bf9f-4a03-9706-a372e42f4473']
Number of requested results 45 is greater than number of elements in index 2, updating n_results = 2
Size after deleting documents 8462336 documents: []