elasticsearch-py icon indicating copy to clipboard operation
elasticsearch-py copied to clipboard

Add doc warning that releasing an `Elasticsearch` object to GC can trigger ResourceWarnings if the transport is not closed

Open sirosen opened this issue 3 years ago • 1 comments

I have an application which was creating and discarding Elasticsearch objects on the fly in a few contexts. Unfortunately, when GC runs and cleans up Elasticsearch, it cleans out the transport, a urllib3 connection pool in my case, and doesn't close it first. This doesn't cause any behavioral issues but it's a resource leak. When we turned on pytest filterwarnings=error, we got mysterious ResourceWarnings in our test runs which blew up.

It took a bit of digging to figure out what was happening because those warnings were thrown in __del__ on sockets during GC. (Cue: me learning about the existence of sys.unraisablehook. 😱 )

I don't believe that Elasticsearch should implement __del__ to close the transport. urllib3 has made the decision that leaving cleanup entirely for GC is incorrect, and wrapping things to do it in elasticsearch-py would subvert that decision.

However, because Elasticsearch is two layers of abstraction away from this warning (elasticsearch-py -> urllib3 -> socket), it's hard to know that you really should close the transport before leaving Elasticsearch for GC. The introduction of __enter__ and __exit__ and close in the 7.x-line hints at this, but it's non-obvious in the docs. Perhaps it's sufficient to add an explicit es.close() call to the example usage? Or maybe there should be a doc note or warning about the fact that you should always close Elasticsearch objects before discarding? Apologies if there's already such a note -- I couldn't find one.

Also, less critically, I'm still on 6.x right now, so a note for the 6.x docs specifically to call out es.transport.close() would have helped me. I'm happy to submit a PR for this or other changes with some guidance/input on where it should go.

sirosen avatar Feb 12 '21 00:02 sirosen

Same issue here, I'm getting sporadic ResourceWarning even if Elasticsearch is still "alive" (long lived object).

Interestingly that the error appears from different unrelated places (I suppose it is because GC gets triggered randomly):

/usr/local/lib/python3.9/site-packages/pandas/core/series.py:536: ResourceWarning: unclosed <ssl.SSLSocket fd=24, family=AddressFamily.AF_INET, type=SocketKind.SOCK_STREAM, proto=6, laddr=('<IP>', 38860), raddr=('<IP>', 9243)>
  return self._name
ResourceWarning: Enable tracemalloc to get the object allocation traceback

Obviously in this case, pandas has nothing to do with Elastic

vkolotov avatar Oct 19 '21 09:10 vkolotov