chroma icon indicating copy to clipboard operation
chroma copied to clipboard

[Bug]: Unable to delete the persistent directory from another program due to PermissionError

Open axiangcoding opened this issue 1 year ago • 15 comments

What happened?

I have two programmes, one build by fastapi, we call it server, and one for schedule tasks, we call it cronjob. I'm using chromadb in server to create the data through chromadb sdk (wrapper by langchain-chromadb), and i have running the cronjob to clean the persist directory.

But when chromadb finishes executing normally in server, and then I delete the persist directory on the cronjob, error happened.

PermissionError: [WinError 32] The process cannot access the file because it is being used by another process: '/xxx/chromadb\\773758b4e4e888ab613ce67feb3329b4\\a1a91d6b-3f63-463d-950c-ee1638e306e1\\data_level0.bin'

code in server is as follow:

client = chromadb.PersistentClient(presist_path)
client.get_or_create_collection()
# ...do things here...

code in cronjob is as follow:

shutil.rmtree(presist_path)

This is very strange. It looks like some resources are not being released. Any idea of where should i start to debug it?

Versions

both service install the same version of chromadb

langchain-chroma = "^0.1.1" chromadb = "^0.5.3"

Relevant log output

shutil.rmtree(folder_full_path)
    │      │      └ '/xxx/chromadb\\773758b4e4e888ab613ce67feb3329b4'
    │      └ <function rmtree at 0x0000014FFD6C8C20>
    └ <module 'shutil' from 'D:\\Program Files\\python311\\Lib\\shutil.py'>

  File "D:\Program Files\python311\Lib\shutil.py", line 759, in rmtree
    return _rmtree_unsafe(path, onerror)
           │              │     └ <function rmtree.<locals>.onerror at 0x0000014F901F7060>
           │              └ '/xxx/chromadb\\773758b4e4e888ab613ce67feb3329b4'
           └ <function _rmtree_unsafe at 0x0000014FFD6C8AE0>
  File "D:\Program Files\python311\Lib\shutil.py", line 617, in _rmtree_unsafe
    _rmtree_unsafe(fullname, onerror)
    │              │         └ <function rmtree.<locals>.onerror at 0x0000014F901F7060>
    │              └ '/xxx/chromadb\\773758b4e4e888ab613ce67feb3329b4\\a1a91d6b-3f63-463d-950c-ee1638e306e1'
    └ <function _rmtree_unsafe at 0x0000014FFD6C8AE0>
  File "D:\Program Files\python311\Lib\shutil.py", line 622, in _rmtree_unsafe
    onerror(os.unlink, fullname, sys.exc_info())
    │       │  │       │         │   └ <built-in function exc_info>
    │       │  │       │         └ <module 'sys' (built-in)>
    │       │  │       └ '/xxx/chromadb\\773758b4e4e888ab613ce67feb3329b4\\a1a91d6b-3f63-463d-950c-ee1638e306e1\\data_level0.bin'
    │       │  └ <built-in function unlink>
    │       └ <module 'os' (frozen)>
    └ <function rmtree.<locals>.onerror at 0x0000014F901F7060>
  File "D:\Program Files\python311\Lib\shutil.py", line 620, in _rmtree_unsafe
    os.unlink(fullname)
    │  │      └ '/xxx/chromadb\\773758b4e4e888ab613ce67feb3329b4\\a1a91d6b-3f63-463d-950c-ee1638e306e1\\data_level0.bin'
    │  └ <built-in function unlink>
    └ <module 'os' (frozen)>

PermissionError: [WinError 32] The process cannot access the file because it is being used by another process: '/xxx/chromadb\\773758b4e4e888ab613ce67feb3329b4\\a1a91d6b-3f63-463d-950c-ee1638e306e1\\data_level0.bin'

axiangcoding avatar Jul 03 '24 08:07 axiangcoding

Looks like the issue has been mentioned before at https://github.com/chroma-core/chroma/issues/1009#issuecomment-1695083394, and https://github.com/chroma-core/chroma/issues/1152 Could it only be the issue in Windows?

axiangcoding avatar Jul 03 '24 08:07 axiangcoding

Hey @axiangcoding, you are hitting a Windows-related problem where (not necessarily your case) a Windows admin process (e.g., MS Defender) holds the file for a little while after another process has accessed it.

However, your case might be slightly different as your FastAPI can hold the file, given that your cronjob is a separate process. Before trying to run shutil.rmtree(presist_path) do you delete the given collection for which you're cleaning up the data (e.g. client.delete_collection("col_name")?

tazarov avatar Jul 04 '24 05:07 tazarov

hi @tazarov , I have try client.delete_collection() before remove directory, but it only cleaned up the data in the sqlite database and didn't delete any files.

btw, i tried client.reset() too, didn't delete any files.

axiangcoding avatar Jul 04 '24 14:07 axiangcoding

@axiangcoding, can you trace what process keeps the lock on the file? As an alternative, have you tried running things into a docker container with a volume instead of directory mount?

tazarov avatar Jul 06 '24 13:07 tazarov

@axiangcoding, can you trace what process keeps the lock on the file? As an alternative, have you tried running things into a docker container with a volume instead of directory mount?

Just known that it is the fastapi process that keep the file lock. I'll try the other sugguestions later

axiangcoding avatar Jul 07 '24 06:07 axiangcoding

@axiangcoding, thanks for confirming. That means that Chroma hasn't released the file yet.

tazarov avatar Jul 07 '24 07:07 tazarov

hi @tazarov , I've tested it in k8s, mounted an azure-file to the persistent directory, both fastapi and cronjob pod can access it. And cronjob pod still could not delete the directory. (same as docker volume i think)

error has changed.

Directory not empty

I also try the rm -rf command, same error too. And I need to restart the fastapi pod several times, so i can delete it.

axiangcoding avatar Jul 19 '24 08:07 axiangcoding

Hello @axiangcoding , did you manage to solve this? Currently experiencing the exact same issue using FastAPI. Calling reset() does not work.

CarlaFernandez avatar Nov 04 '24 09:11 CarlaFernandez

Hello @axiangcoding , did you manage to solve this? Currently experiencing the exact same issue using FastAPI. Calling reset() does not work.

I believe it just can't. That is not how chroma presistent works

axiangcoding avatar Nov 04 '24 14:11 axiangcoding

Hello @axiangcoding , did you manage to solve this? Currently experiencing the exact same issue using FastAPI. Calling reset() does not work.

I believe it just can't. That is not how chroma presistent works

Thanks for the reply. In my case, I finally managed to solve it doing the following:

    chroma_client = chromadb.PersistentClient(path=chroma_db_path, settings=global_settings)
    chroma_client.delete_collection("project_collection") # Remove any data from the chroma store
    chroma_client.clear_system_cache()
    chroma_client.reset()
    del chroma_client  # Remove the reference to the client
    gc.collect()       # Force garbage collection

CarlaFernandez avatar Nov 04 '24 14:11 CarlaFernandez

This is just an additional message to explain that I needed to add an additional:

subprocess.run(['rm', '-rf', chromadb_name], check=True)

Because the above stopped working. It feels like a patch on top of something that should work out of the box, but it is the best I have been able to reach, since no further logs are output from Chroma.

CarlaFernandez avatar Nov 12 '24 14:11 CarlaFernandez

I don't recommend chromadb in scenarios where writing data is separated from reading data (similar to my case of providing two api in a web service, one for saving vector data and one for searching), which is not typical use case of chromadb. In my scenario, I've moved on to other vector databases

axiangcoding avatar Nov 13 '24 00:11 axiangcoding

@axiangcoding can you verify if you have this issue with an up-to-date Chroma version (0.6.3 or later)?. Did you try our CLI vacuum command?

itaismith avatar Jan 15 '25 09:01 itaismith

Hello @axiangcoding , did you manage to solve this? Currently experiencing the exact same issue using FastAPI. Calling reset() does not work.

I believe it just can't. That is not how chroma presistent works

Thanks for the reply. In my case, I finally managed to solve it doing the following:

chroma_client = chromadb.PersistentClient(path=chroma_db_path, settings=global_settings)
chroma_client.delete_collection("project_collection") # Remove any data from the chroma store
chroma_client.clear_system_cache()
chroma_client.reset()
del chroma_client  # Remove the reference to the client
gc.collect()       # Force garbage collection

I tried this code and got another error: OperationalError: attempt to write a readonly database Wow..How can i delete?

doyoungim999 avatar Mar 13 '25 02:03 doyoungim999

when you need to clean the cache... set ur client without presist then delete the folder vectorstore = Chroma(persist_directory=None) shutil.rmtree(chroma_persist_directory)

then reload the store vectorstore = Chroma.from_documents( ... persist_directory=chroma_persist_directory, )

EDIT: i just read the op doing in a seperate process might be an issue unless you are calling the fastapi from ur cron.

EDIT: it doesnt always work either.

Zunair avatar Apr 04 '25 20:04 Zunair