elasticsearch-py
elasticsearch-py copied to clipboard
Add doc warning that releasing an `Elasticsearch` object to GC can trigger ResourceWarnings if the transport is not closed
I have an application which was creating and discarding Elasticsearch
objects on the fly in a few contexts.
Unfortunately, when GC runs and cleans up Elasticsearch
, it cleans out the transport, a urllib3 connection pool in my case, and doesn't close it first.
This doesn't cause any behavioral issues but it's a resource leak.
When we turned on pytest filterwarnings=error
, we got mysterious ResourceWarning
s in our test runs which blew up.
It took a bit of digging to figure out what was happening because those warnings were thrown in __del__
on sockets during GC.
(Cue: me learning about the existence of sys.unraisablehook
. 😱 )
I don't believe that Elasticsearch
should implement __del__
to close the transport.
urllib3 has made the decision that leaving cleanup entirely for GC is incorrect, and wrapping things to do it in elasticsearch-py would subvert that decision.
However, because Elasticsearch
is two layers of abstraction away from this warning (elasticsearch-py -> urllib3 -> socket
), it's hard to know that you really should close the transport before leaving Elasticsearch
for GC.
The introduction of __enter__
and __exit__
and close
in the 7.x-line hints at this, but it's non-obvious in the docs. Perhaps it's sufficient to add an explicit es.close()
call to the example usage? Or maybe there should be a doc note or warning about the fact that you should always close Elasticsearch
objects before discarding?
Apologies if there's already such a note -- I couldn't find one.
Also, less critically, I'm still on 6.x right now, so a note for the 6.x docs specifically to call out es.transport.close()
would have helped me. I'm happy to submit a PR for this or other changes with some guidance/input on where it should go.
Same issue here, I'm getting sporadic ResourceWarning even if Elasticsearch is still "alive" (long lived object).
Interestingly that the error appears from different unrelated places (I suppose it is because GC gets triggered randomly):
/usr/local/lib/python3.9/site-packages/pandas/core/series.py:536: ResourceWarning: unclosed <ssl.SSLSocket fd=24, family=AddressFamily.AF_INET, type=SocketKind.SOCK_STREAM, proto=6, laddr=('<IP>', 38860), raddr=('<IP>', 9243)>
return self._name
ResourceWarning: Enable tracemalloc to get the object allocation traceback
Obviously in this case, pandas has nothing to do with Elastic