[Bug]: Unable to delete the persistent directory from another program due to PermissionError
What happened?
I have two programmes, one build by fastapi, we call it server, and one for schedule tasks, we call it cronjob.
I'm using chromadb in server to create the data through chromadb sdk (wrapper by langchain-chromadb), and i have running the cronjob to clean the persist directory.
But when chromadb finishes executing normally in server, and then I delete the persist directory on the cronjob, error happened.
PermissionError: [WinError 32] The process cannot access the file because it is being used by another process: '/xxx/chromadb\\773758b4e4e888ab613ce67feb3329b4\\a1a91d6b-3f63-463d-950c-ee1638e306e1\\data_level0.bin'
code in server is as follow:
client = chromadb.PersistentClient(presist_path)
client.get_or_create_collection()
# ...do things here...
code in cronjob is as follow:
shutil.rmtree(presist_path)
This is very strange. It looks like some resources are not being released. Any idea of where should i start to debug it?
Versions
both service install the same version of chromadb
langchain-chroma = "^0.1.1" chromadb = "^0.5.3"
Relevant log output
shutil.rmtree(folder_full_path)
│ │ └ '/xxx/chromadb\\773758b4e4e888ab613ce67feb3329b4'
│ └ <function rmtree at 0x0000014FFD6C8C20>
└ <module 'shutil' from 'D:\\Program Files\\python311\\Lib\\shutil.py'>
File "D:\Program Files\python311\Lib\shutil.py", line 759, in rmtree
return _rmtree_unsafe(path, onerror)
│ │ └ <function rmtree.<locals>.onerror at 0x0000014F901F7060>
│ └ '/xxx/chromadb\\773758b4e4e888ab613ce67feb3329b4'
└ <function _rmtree_unsafe at 0x0000014FFD6C8AE0>
File "D:\Program Files\python311\Lib\shutil.py", line 617, in _rmtree_unsafe
_rmtree_unsafe(fullname, onerror)
│ │ └ <function rmtree.<locals>.onerror at 0x0000014F901F7060>
│ └ '/xxx/chromadb\\773758b4e4e888ab613ce67feb3329b4\\a1a91d6b-3f63-463d-950c-ee1638e306e1'
└ <function _rmtree_unsafe at 0x0000014FFD6C8AE0>
File "D:\Program Files\python311\Lib\shutil.py", line 622, in _rmtree_unsafe
onerror(os.unlink, fullname, sys.exc_info())
│ │ │ │ │ └ <built-in function exc_info>
│ │ │ │ └ <module 'sys' (built-in)>
│ │ │ └ '/xxx/chromadb\\773758b4e4e888ab613ce67feb3329b4\\a1a91d6b-3f63-463d-950c-ee1638e306e1\\data_level0.bin'
│ │ └ <built-in function unlink>
│ └ <module 'os' (frozen)>
└ <function rmtree.<locals>.onerror at 0x0000014F901F7060>
File "D:\Program Files\python311\Lib\shutil.py", line 620, in _rmtree_unsafe
os.unlink(fullname)
│ │ └ '/xxx/chromadb\\773758b4e4e888ab613ce67feb3329b4\\a1a91d6b-3f63-463d-950c-ee1638e306e1\\data_level0.bin'
│ └ <built-in function unlink>
└ <module 'os' (frozen)>
PermissionError: [WinError 32] The process cannot access the file because it is being used by another process: '/xxx/chromadb\\773758b4e4e888ab613ce67feb3329b4\\a1a91d6b-3f63-463d-950c-ee1638e306e1\\data_level0.bin'
Looks like the issue has been mentioned before at https://github.com/chroma-core/chroma/issues/1009#issuecomment-1695083394, and https://github.com/chroma-core/chroma/issues/1152 Could it only be the issue in Windows?
Hey @axiangcoding, you are hitting a Windows-related problem where (not necessarily your case) a Windows admin process (e.g., MS Defender) holds the file for a little while after another process has accessed it.
However, your case might be slightly different as your FastAPI can hold the file, given that your cronjob is a separate process. Before trying to run shutil.rmtree(presist_path) do you delete the given collection for which you're cleaning up the data (e.g. client.delete_collection("col_name")?
hi @tazarov , I have try client.delete_collection() before remove directory, but it only cleaned up the data in the sqlite database and didn't delete any files.
btw, i tried client.reset() too, didn't delete any files.
@axiangcoding, can you trace what process keeps the lock on the file? As an alternative, have you tried running things into a docker container with a volume instead of directory mount?
@axiangcoding, can you trace what process keeps the lock on the file? As an alternative, have you tried running things into a docker container with a volume instead of directory mount?
Just known that it is the fastapi process that keep the file lock. I'll try the other sugguestions later
@axiangcoding, thanks for confirming. That means that Chroma hasn't released the file yet.
hi @tazarov , I've tested it in k8s, mounted an azure-file to the persistent directory, both fastapi and cronjob pod can access it. And cronjob pod still could not delete the directory. (same as docker volume i think)
error has changed.
Directory not empty
I also try the rm -rf command, same error too. And I need to restart the fastapi pod several times, so i can delete it.
Hello @axiangcoding , did you manage to solve this? Currently experiencing the exact same issue using FastAPI. Calling reset() does not work.
Hello @axiangcoding , did you manage to solve this? Currently experiencing the exact same issue using FastAPI. Calling
reset()does not work.
I believe it just can't. That is not how chroma presistent works
Hello @axiangcoding , did you manage to solve this? Currently experiencing the exact same issue using FastAPI. Calling
reset()does not work.I believe it just can't. That is not how chroma presistent works
Thanks for the reply. In my case, I finally managed to solve it doing the following:
chroma_client = chromadb.PersistentClient(path=chroma_db_path, settings=global_settings)
chroma_client.delete_collection("project_collection") # Remove any data from the chroma store
chroma_client.clear_system_cache()
chroma_client.reset()
del chroma_client # Remove the reference to the client
gc.collect() # Force garbage collection
This is just an additional message to explain that I needed to add an additional:
subprocess.run(['rm', '-rf', chromadb_name], check=True)
Because the above stopped working. It feels like a patch on top of something that should work out of the box, but it is the best I have been able to reach, since no further logs are output from Chroma.
I don't recommend chromadb in scenarios where writing data is separated from reading data (similar to my case of providing two api in a web service, one for saving vector data and one for searching), which is not typical use case of chromadb. In my scenario, I've moved on to other vector databases
@axiangcoding can you verify if you have this issue with an up-to-date Chroma version (0.6.3 or later)?. Did you try our CLI vacuum command?
Hello @axiangcoding , did you manage to solve this? Currently experiencing the exact same issue using FastAPI. Calling
reset()does not work.I believe it just can't. That is not how chroma presistent works
Thanks for the reply. In my case, I finally managed to solve it doing the following:
chroma_client = chromadb.PersistentClient(path=chroma_db_path, settings=global_settings) chroma_client.delete_collection("project_collection") # Remove any data from the chroma store chroma_client.clear_system_cache() chroma_client.reset() del chroma_client # Remove the reference to the client gc.collect() # Force garbage collection
I tried this code and got another error: OperationalError: attempt to write a readonly database Wow..How can i delete?
when you need to clean the cache... set ur client without presist then delete the folder vectorstore = Chroma(persist_directory=None) shutil.rmtree(chroma_persist_directory)
then reload the store vectorstore = Chroma.from_documents( ... persist_directory=chroma_persist_directory, )
EDIT: i just read the op doing in a seperate process might be an issue unless you are calling the fastapi from ur cron.
EDIT: it doesnt always work either.