haystack-core-integrations icon indicating copy to clipboard operation
haystack-core-integrations copied to clipboard

Document Store - closing resources

Open davidsbatista opened this issue 11 months ago • 4 comments

In most document stores, we establish a connection with some resource but never explicitly close it or destroy it; most of the clients we wrap in our document stores (e.g., ElasticSearch, Mongo, Chroma, etc.) have an init()/connect() and also a close() method.

We could add a method that is triggered when the doc store is destroyed (e.g., based on del), and then release all the allocated resources and established connections.

For more controlled cleanup, we can also consider adding an explicit close() method that users can call when they're done with the document store.

davidsbatista avatar Apr 08 '25 15:04 davidsbatista

This was already previously discussed for Weavite https://github.com/deepset-ai/haystack-core-integrations/issues/471

davidsbatista avatar Apr 08 '25 15:04 davidsbatista

to release async connections/resources the equivalent is __aexit__

davidsbatista avatar Apr 09 '25 13:04 davidsbatista

I encountered this topic when looking at #1634.

I think that we should find a common approach to handle this. Here's my idea:

class MyDocumentStore:
    def close(self):
        # Clean up sync resources
        ...

    async def close_async(self):
        # Clean up async resources
        ...

    def __del__(self):
        # not necessary, it might be nice to have
        try:
            self.close()
        except Exception:
            pass
  • The close() method is intended for synchronous cleanup.

  • The close_async() method is for async-aware clients that require await to release resources.

  • __del__ acts as a fallback but should not be relied on for cleanup on async environment. Can be removed if it creates misleading expectations.

In practice, I expect users working with async to call await close_async() manually.

I would like to discuss this also with @mpangrazzi and @Amnah199.


UPDATE

I discussed this a bit with Michele.

  • The approach above seems reasonable.
  • Each Document Store has a different client implementation, so the implementation of close/close_asyncwill vary; in some specific situations (especially for sync), these methods might not even be needed.
  • This proposal alone doesn't solve the problem of properly closing resources when using Async Pipelines. Based on my initial investigation, solving this at the component level (e.g., via a DocumentStoreResourceClosercomponent) does not seem feasible because each component in the Pipeline using the same Document Store is actually creating a new Document Store instance. It would be possible to solve this operating at the Pipeline level.

anakin87 avatar Jun 18 '25 14:06 anakin87

Refining my previous approach...

  • __del__ is not necessary and I would avoid it for symmetry reasons: it would only work for sync environment where we don't have issues.
  • close and close_async should also modify the Document Store state (client? index? Different implementations in different DBs), making it possible for example to close it, do other operations like count_documents (that internally re-open the connection) and then close it again.

anakin87 avatar Jul 25 '25 10:07 anakin87