Close function doesn't actually close everything
Is this a support request? Support Ticket #31907
Describe the bug
We are running a python flask app on a set of linux servers and in an AWS lambda. In our app, we create a ldclient instance and try to close() it when we are done. The ldclient.get().close() function call didn't close all threads. The left over threads pile up and over time hit the system thread limit, because the application process is still running and is called on a new task. We start getting an error of RuntimeError: can't start new thread
In short, the bug is that the close() function didn't close all threads. If set_config get's called a 2nd time, as documented, a new ldclient instance is created, the old one never completely goes away.
To reproduce
import atexit
import ldclient
import threading
def __set_launchdarkly_configuration(app):
ld_key = config.envars.get("LAUNCHDARKLY_SDK_KEY")
if ld_key:
ld_config = ldclient.config.Config(ld_key)
else:
ld_config = ldclient.config.Config("NOT_CONFIGURED", offline=True)
logger.warning(
"LAUNCHDARKLY_SDK_KEY is not set, feature flags will always return defaults"
)
ldclient.set_config(ld_config)
atexit.register(lambda: ldclient.get().close())
@atexit.register
def teardown_launchdarkly_configuration(exception=None):
logger.info(f"teardown launchdarkly_configuration, exception?: {exception}")
logger.info(f"Number of active threads: {threading.active_count()}")
# Call __set_launchdarkly_configuration function, ldclient.get(), and client.variation
Expected behavior
Ultimately, I expected the function ldclient.get().close() to close all threads and release all ldclient objects. I would expect my next ldclient.get().is_initialized() to be false because I closed ldclient so it should be in a state where I have to call set_config
I expected ldclient.set_config(ld_config) to be smart enough to only create new instances if there doesn't exist an active instance or if the config values changed from the previous call. I know that ldclient.get() enforces the singleton pattern, then I would expect `ldclient.set_config() to as well.
Maybe there should be something like a get_config function to allow me to verify that the settings in code are the same as the active config and then I could re-use the existing ldclient instead of creating a new one.
Logs
builtins.RuntimeError: can't start new thread
Traceback (most recent call last):
<removed>
File /function/ldclient/__init__.py, line 42, in set_config
{code_line}
File /function/ldclient/client.py, line 97, in __init__
{code_line}
File /function/ldclient/client.py, line 120, in _set_event_processor
{code_line}
File /function/ldclient/event_processor.py, line 396, in __init__
{code_line}
File /function/ldclient/repeating_timer.py, line 17, in start
{code_line}
File /usr/local/lib/python3.7/threading.py, line 852, in start
{code_line}
RuntimeError: can't start new thread
SDK version We updated to version 7.5.1 and still saw this error. We tried v8.0.0 as well, same issue.
Language version, developer tools Python 3.7
OS/platform
Docker container running python:3.7-slim and Ubuntu 20.04
Additional context
It looks to me like this is probably a very long-standing problem with the implementation of close(). I think the thread that is not being stopped is the one that reads the flag data SSE stream. Since the server has not closed the connection, this thread is blocked waiting for more data. If I'm right, then using the SDK in polling mode (set stream=False in the configuration), so that the request does not stay open, would mean you would not see any lingering threads after calling close(). That wouldn't be a long-term solution, just a way to test the theory.
The obvious solution would be to force a close of the connection from the client side, but I've been having trouble finding a workable way to do this— urllib3's HTTPResponse.close() appears to simply hang if you try to call it from thread A (in this case the application thread that's trying to close the LDClient) while thread B (the SSE reader) is doing a blocking read. I'm still investigating what the right way is to do this; urllib3 documentation and Python documentation are not very clear on this subject since it's a somewhat unusual use case.
We tried stream=False and it seemed to not help.
@agitelzon Hmm— that's unexpected, so maybe there is something else going on. We're continuing to investigate.
Can you tell me exactly how many excess threads you are seeing?
Also— I missed one detail in your original post that I should have commented on. You said:
I expected
ldclient.set_config(ld_config)to be smart enough to only create new instances if there doesn't exist an active instance or if the config values changed from the previous call. I know thatldclient.get()enforces the singleton pattern, then I would expectldclient.set_config()to as well.
That is not the documented behavior of set_config. Please see the documentation:
Sets the configuration for the shared SDK client instance. If this is called prior to ldclient.get(), it stores the configuration that will be used when the client is initialized. If it is called after the client has already been initialized, the client will be re-initialized with the new configuration (this will result in the next call to ldclient.get() returning a new client instance).
In other words, if you only want the client to be created once, and the configuration has not changed, you should not be calling set_config more than once.
Can you tell me exactly how many excess threads you are seeing?
Every time a new job would run in the code, we'd see 11 new threads created, 9 threads shutdown, and 2 threads would stick around forever. The next time a job would run, again 11 new, 9 closed, 2 left over.
This would keep happening until the application ran out of threads in the system.
In other words, if you only want the client to be created once, and the configuration has not changed, you should not be calling set_config more than once.
For now, our workaround was this block of code at the beginning of the __set_launchdarkly_configuration function.
try:
if ldclient.get().is_initialized():
logger.info("LAUNCHDARKLY already initialized")
return
except Exception as e:
logger.info("LAUNCHDARKLY not initialized yet. Setting config...")
pass
Thanks. If it's 5 threads, then my original guess about what was going on was wrong, but that does give us a clue as to what else to look into.
Sorry if this is a silly question, but... in the first code excerpt you posted, is it for sure that the ldclient.get().close() is happening before the teardown_launchdarkly_configuration method runs and prints the thread count? I'm not sure how atexit hooks are ordered when one of them is from a method annotation.
Thanks. If it's 5 threads, then my original guess about what was going on was wrong, but that does give us a clue as to what else to look into.
Please take a look at my post again. I went back through our logs and got an accurate count and updated my post.
Sorry if this is a silly question, but... in the first code excerpt you posted, is it for sure that the ldclient.get().close() is happening before the teardown_launchdarkly_configuration method runs and prints the thread count? I'm not sure how atexit hooks are ordered when one of them is from a method annotation.
We went through a few iterations with testing the block of code and putting in print statements. I just grabbed a simplified version to post here. At first, we had the ldclient.get().close() statement in the function block with print statements before and after, so we could be sure on the number of threads. Once we had a good idea that the culprit was ldclient, we removed most of the print statements to clean up the code.
I could update the sample code with more statements if that would be helpful.
Is this still being looked into? Looks like the SDK still has this issue where closing the LD client won't close the background stream.
Is this still being looked into? Looks like the SDK still has this issue where closing the LD client won't close the background stream.
Yes, we are currently in discussions with the urllib3 developers regarding this issue. We are working with them to extend some functionality to address this issue.
We will let you know as soon as we have this resolved!
@agitelzon and @ns-saggarwal that urllib3 developers released a fix for this issue, which we then adopted into our SDK.
To resolve this issue, you must have the following:
- Python 3.9+
- urllib3 2.3.0+
- LD eventsource 1.2.1
- LD SDK 9.8.1
Thank you for your patience while we worked through those issues.