chroma icon indicating copy to clipboard operation
chroma copied to clipboard

[Bug]: PersistentClient.delete_collection does not delete persitent folders when uvicorn server is restarted

Open Yuhui0620 opened this issue 2 years ago • 7 comments

What happened?

I have a fastapi app that checks for changes of knowledge bases in a config file when the server is restarted. For those removed configurations, I want to delete the related collections as well. But folders ./chroma/{uuid} can not be deleted after the uvicorn server is restarted. I wrote a simple demo to reproduce the problem as discussed in #1245.

Test demo:


import os

import chromadb
from chromadb.config import Settings
from fastapi import FastAPI

print("chromadb.__version__", chromadb.__version__)


def get_folder_size(start_path: str) -> float:
    total_size = 0
    for dirpath, dirnames, filenames in os.walk(start_path):
        for f in filenames:
            fp = os.path.join(dirpath, f)
            # skip if it is symbolic link
            if not os.path.islink(fp):
                total_size += os.path.getsize(fp)

    return total_size / (1024 * 1024)  # convert bytes to megabytes


app = FastAPI()


@app.get("/upsert")
async def upsert():
    """create a collection and upsert docs"""
    script_entry = get_folder_size("./chroma")
    print("on script run", script_entry)

    client = chromadb.PersistentClient(settings=Settings(allow_reset=True))

    client.reset()

    after_reset = get_folder_size("./chroma")
    print("after_reset", after_reset)

    collection = client.get_or_create_collection("fruit")
    collection.upsert(
        documents=["apples", "oranges", "bananas", "pineapples"], ids=["1", "2", "3", "4"]
    )

    # print(collection.query(query_texts=["hawaii"], n_results=1))

    # get the size of the folder called ./chroma

    before_size = get_folder_size("./chroma")
    print("before", before_size)


@app.get("/delete")
async def delete():
    """delete a collection"""
    client = chromadb.PersistentClient(settings=Settings(allow_reset=True))
    client.delete_collection("fruit")
    after_size = get_folder_size("./chroma")
    print("after", after_size)

    # difference
    # print("diff", before_size - after_size)

    client.reset()
    after_reset_end = get_folder_size("./chroma")
    print("after_reset end", after_reset_end)

I started the server with uvicorn main:app --port 8901 --host 0.0.0.0 and called localhost:8901/upsert, the collection was created successfully and the size of ./chroma changed from 0.140625 to 1.7428932189941406:

chromadb.__version__ 0.4.15
INFO:     Started server process [7336]
INFO:     Waiting for application startup.
INFO:     Application startup complete.
INFO:     Uvicorn running on http://0.0.0.0:8901 (Press CTRL+C to quit)
on script run 0.0
after_reset 0.140625
before 1.7428932189941406
INFO:     127.0.0.1:55650 - "GET /upsert HTTP/1.1" 200 OK

Then the server was shutdown by ctrl + c and restarted:

INFO:     Shutting down
INFO:     Waiting for application shutdown.
INFO:     Application shutdown complete.
INFO:     Finished server process [7336]

(chroma) E:\PycharmProjects\Test>uvicorn main:app --port 8901 --host 0.0.0.0
chromadb.__version__ 0.4.15
INFO:     Started server process [9124]
INFO:     Waiting for application startup.
INFO:     Application startup complete.
INFO:     Uvicorn running on http://0.0.0.0:8901 (Press CTRL+C to quit)

Finally, I called localhost:8901/delete and found that the size of ./chroma did not change (still 1.7428932189941406) because the folder of collection fruit was not deleted:

after 1.7428932189941406
after_reset end 1.7428932189941406
INFO:     127.0.0.1:56013 - "GET /delete HTTP/1.1" 200 OK

Could anyone help me solve this or provide alternative solutions?

Versions

chromadb 0.4.15, python 3.10, windows 10

Relevant log output

No response

Yuhui0620 avatar Oct 29 '23 10:10 Yuhui0620