Document Store - closing resources
In most document stores, we establish a connection with some resource but never explicitly close it or destroy it; most of the clients we wrap in our document stores (e.g., ElasticSearch, Mongo, Chroma, etc.) have an init()/connect() and also a close() method.
We could add a method that is triggered when the doc store is destroyed (e.g., based on del), and then release all the allocated resources and established connections.
For more controlled cleanup, we can also consider adding an explicit close() method that users can call when they're done with the document store.
This was already previously discussed for Weavite https://github.com/deepset-ai/haystack-core-integrations/issues/471
to release async connections/resources the equivalent is __aexit__
I encountered this topic when looking at #1634.
I think that we should find a common approach to handle this. Here's my idea:
class MyDocumentStore:
def close(self):
# Clean up sync resources
...
async def close_async(self):
# Clean up async resources
...
def __del__(self):
# not necessary, it might be nice to have
try:
self.close()
except Exception:
pass
-
The
close()method is intended for synchronous cleanup. -
The
close_async()method is for async-aware clients that requireawaitto release resources. -
__del__acts as a fallback but should not be relied on for cleanup on async environment. Can be removed if it creates misleading expectations.
In practice, I expect users working with async to call await close_async() manually.
I would like to discuss this also with @mpangrazzi and @Amnah199.
UPDATE
I discussed this a bit with Michele.
- The approach above seems reasonable.
- Each Document Store has a different client implementation, so the implementation of
close/close_asyncwill vary; in some specific situations (especially for sync), these methods might not even be needed. - This proposal alone doesn't solve the problem of properly closing resources when using Async Pipelines. Based on my initial investigation, solving this at the component level (e.g., via a
DocumentStoreResourceClosercomponent) does not seem feasible because each component in the Pipeline using the same Document Store is actually creating a new Document Store instance. It would be possible to solve this operating at the Pipeline level.
Refining my previous approach...
-
__del__is not necessary and I would avoid it for symmetry reasons: it would only work for sync environment where we don't have issues. -
closeandclose_asyncshould also modify the Document Store state (client?index? Different implementations in different DBs), making it possible for example to close it, do other operations likecount_documents(that internally re-open the connection) and then close it again.