stripe-python icon indicating copy to clipboard operation
stripe-python copied to clipboard

stripe api with `requests` leaks socket file descriptors via unclosed Session

Open asottile-sentry opened this issue 3 years ago • 9 comments

Describe the bug

the http request client is assigned to a global variable here: https://github.com/stripe/stripe-python/blob/1ae42227d9df745420c1a3db11893589d91ba83e/stripe/api_requestor.py#L103-L105

requests client is defined here: https://github.com/stripe/stripe-python/blob/1ae42227d9df745420c1a3db11893589d91ba83e/stripe/http_client.py#L287

the Session is assigned here: https://github.com/stripe/stripe-python/blob/1ae42227d9df745420c1a3db11893589d91ba83e/stripe/http_client.py#L318-L319

this Session is never closed leading to file descriptor leak and a ResourceWarning -- the correct usage of a Session is to either utilize the with statement or explicitly .close() it

To Reproduce

the simplest reproduction I can come up with is this one liner:

$ python3  -Wonce -c $'import stripe; stripe.api_key="placeholder"; import contextlib\nwith contextlib.suppress(Exception):\n stripe.Account.list()'
sys:1: ResourceWarning: unclosed <ssl.SSLSocket fd=3, family=AddressFamily.AF_INET, type=SocketKind.SOCK_STREAM, proto=6, laddr=('10.0.2.15', 44434), raddr=('34.200.27.109', 443)>

the ResourceWarning there is coming from the teardown of that Session object I mentioned above -- it's an unclosed connection to the stripe api:

$ nslookup 34.200.27.109
109.27.200.34.in-addr.arpa	name = api-34-200-27-109.stripe.com.

Authoritative answers can be found from:

Expected behavior

utilization of the stripe api should not lead to ResourceWarnings

Code snippets

above

OS

any, though I'm on linux

Language version

any, though I'm using 3.10.4

Library version

4.1.0

API version

N/A

Additional context

No response

asottile-sentry avatar Sep 15 '22 22:09 asottile-sentry

looks like #276 is where the issue was introduced

asottile-sentry avatar Sep 22 '22 17:09 asottile-sentry

Hi @asottile-sentry, thanks for the report!

While we agree it's not ideal, we've found that the benefits of re-using the session are worth not properly closing it on exit. If this is an issue for you, you can close the session manually before your program exits by calling the RequestsClient.close method, or use a different HTTPClient (we have a few defined, see here for configuration instructions).

We'll keep this issue open for potential improvements in the future.

yejia-stripe avatar Sep 27 '22 20:09 yejia-stripe

there isn't a way to reliably do that because the session is in a thread local (and thus leaks once per thread)

plus relying on any user of your library to have to hack around the default behaviour leaking things via global variables isn't ideal

were the "benefits" ever profiled? I suspect in practice they don't make significant difference here (requests is after all limited to H1 requests and keepalive still needs to be re-negotiated) but if you could point me to some evidence of that I'd be happy to be wrong.

if there is a benefit, I'd really like to see a first-class context manager or constructable client rather than global variables -- the current "interface" is really inconvenient and not threadsafe. I say interface lightly because "monkey patch these global variables at the start and end of your program" is barely an interface

I suspect a simpler middle-ground would be to use an ephemeral session if the user does not pass one in (closing it properly on each request) -- then if users want to manage their own session lifecycle they can do so but then the default behaviour doesn't leak socket/file descriptors. I think this is a fairly unobtrusive patch as well --

diff --git a/stripe/http_client.py b/stripe/http_client.py
index ba3838b..160d413 100644
--- a/stripe/http_client.py
+++ b/stripe/http_client.py
@@ -1,5 +1,6 @@
 from __future__ import absolute_import, division, print_function
 
+import contextlib
 import sys
 import textwrap
 import warnings
@@ -87,6 +88,15 @@ def new_default_http_client(*args, **kwargs):
     return impl(*args, **kwargs)
 
 
[email protected]
+def _session(session):
+    if session is not None:
+        yield session
+    else:
+        with requests.Session() as session:
+            yield session
+
+
 class HTTPClient(object):
     MAX_DELAY = 2
     INITIAL_DELAY = 0.5
@@ -315,19 +325,17 @@ class RequestsClient(HTTPClient):
         if is_streaming:
             kwargs["stream"] = True
 
-        if getattr(self._thread_local, "session", None) is None:
-            self._thread_local.session = self._session or requests.Session()
-
         try:
             try:
-                result = self._thread_local.session.request(
-                    method,
-                    url,
-                    headers=headers,
-                    data=post_data,
-                    timeout=self._timeout,
-                    **kwargs
-                )
+                with _session(self._session) as session:
+                    result = session.request(
+                        method,
+                        url,
+                        headers=headers,
+                        data=post_data,
+                        timeout=self._timeout,
+                        **kwargs
+                    )
             except TypeError as e:
                 raise TypeError(
                     "Warning: It looks like your installed version of the "

asottile-sentry avatar Sep 27 '22 20:09 asottile-sentry

Please bear with me as I'm not terribly familiar the topics we're discussing.

there isn't a way to reliably do that because the session is in a thread local (and thus leaks once per thread)

The RequestsClient.close method looks to me like it is using the same thread local as the session. Does this not work the way I think it does?

were the "benefits" ever profiled? I suspect in practice they don't make significant difference here [...] but if you could point me to some evidence of that I'd be happy to be wrong.

Looking at #266 (linked from the PR you mentioned), I don't believe the performance was actually tested. I did a naive test of RequestClient before this change vs the current one, and it does look like reusing the session leads to a slight decrease in request times for tens of requests sent in succession. You're right that this probably doesn't make a difference to most users in practice, so we'll need to reevaluate our implementation and its tradeoffs.

requests is after all limited to H1 requests and keepalive still needs to be re-negotiated

What are H1 requests? I haven't heard of the term, and nothing stood out during a quick search.


Thank you for the suggestions and for bringing this to our attention! We do have some ideas for improvements/fixes for this, so I'll add yours to that list for our team to discuss. Realistically, we're unlikely to work on this right away, but the team is aware of this issue and we'll budget time accordingly.

yejia-stripe avatar Sep 28 '22 14:09 yejia-stripe

Please bear with me as I'm not terribly familiar the topics we're discussing.

there isn't a way to reliably do that because the session is in a thread local (and thus leaks once per thread)

The RequestsClient.close method looks to me like it is using the same thread local as the session. Does this not work the way I think it does?

thread locals are essentially global variables per-thread. so calling close in the main thread doesn't affect instances in other threads

requests is after all limited to H1 requests and keepalive still needs to be re-negotiated

What are H1 requests? I haven't heard of the term, and nothing stood out during a quick search.

ah H1 being HTTP 1.x -- H2 being HTTP 2.x where connection pooling and reuse makes a much more significant difference

asottile-sentry avatar Sep 28 '22 14:09 asottile-sentry

thread locals are essentially global variables per-thread. so calling close in the main thread doesn't affect instances in other threads

I'm not quite following- since our custom close() calls down to self._thread_local.session.close(), won't that work on a per-thread basis? Assuming that each of your threads calls close() manually, I guess.

xavdid-stripe avatar Sep 28 '22 17:09 xavdid-stripe

thread locals are essentially global variables per-thread. so calling close in the main thread doesn't affect instances in other threads

I'm not quite following- since our custom close() calls down to self._thread_local.session.close(), won't that work on a per-thread basis? Assuming that each of your threads calls close() manually, I guess.

each thread would need to do that because they are thread locals -- meaning you cannot ...

... close the session manually before your program exits

you could do it in each thread if you have strong control over the thread runtime, but most of the time you're not going to have strong control over the threads (think like thread pools, web frameworks, etc.) -- for example django's thread-based executor for which I'd have to hack around with django internals to have the right patch point to close this (and I'm not even sure that's accessible at all since I don't think the daemon thread can even configure that).

asottile-sentry avatar Sep 28 '22 18:09 asottile-sentry

were the "benefits" ever profiled? I suspect in practice they don't make significant difference here

I very naïvely profiled this on my Macbook: https://github.com/richardm-stripe/stripe-python/pull/2 and found ~33% speedup from session re-use. Unfortunately I don't think that's insignificant enough to change the defaults for the global client.

if there is a benefit, I'd really like to see a first-class context manager or constructable client rather than global variables -- the current "interface" is really inconvenient and not threadsafe. I say interface lightly because "monkey patch these global variables at the start and end of your program" is barely an interface

This is spot on, and we have that now with StripeClient released in stripe-python v5.

The default right now is to initialize a client without a session which triggers the uncleanupable thread-local session creation, the same way as the global client does, but I think we should change this so that you get one session per StripeClient, something like https://github.com/stripe/stripe-python/pull/1261. As long as users create one StripeClient per thread, this should perform the same as the global client re: session re-use.

Following up from that we could make StripeClient a contextmanager so

with stripe.StripeClient as client:
  client.customers.create(...)

richardm-stripe avatar Feb 28 '24 00:02 richardm-stripe