python-server-sdk icon indicating copy to clipboard operation
python-server-sdk copied to clipboard

Close function doesn't actually close everything

Open agitelzon opened this issue 2 years ago • 9 comments

Is this a support request? Support Ticket #31907

Describe the bug We are running a python flask app on a set of linux servers and in an AWS lambda. In our app, we create a ldclient instance and try to close() it when we are done. The ldclient.get().close() function call didn't close all threads. The left over threads pile up and over time hit the system thread limit, because the application process is still running and is called on a new task. We start getting an error of RuntimeError: can't start new thread

In short, the bug is that the close() function didn't close all threads. If set_config get's called a 2nd time, as documented, a new ldclient instance is created, the old one never completely goes away.

To reproduce

import atexit
import ldclient
import threading

def __set_launchdarkly_configuration(app):
    ld_key = config.envars.get("LAUNCHDARKLY_SDK_KEY")
    if ld_key:
        ld_config = ldclient.config.Config(ld_key)
    else:
        ld_config = ldclient.config.Config("NOT_CONFIGURED", offline=True)
        logger.warning(
            "LAUNCHDARKLY_SDK_KEY is not set, feature flags will always return defaults"
        )
    ldclient.set_config(ld_config)
    atexit.register(lambda: ldclient.get().close())

    @atexit.register
    def teardown_launchdarkly_configuration(exception=None):
        logger.info(f"teardown launchdarkly_configuration, exception?: {exception}")
        logger.info(f"Number of active threads: {threading.active_count()}")
 
# Call __set_launchdarkly_configuration function, ldclient.get(), and client.variation

Expected behavior Ultimately, I expected the function ldclient.get().close() to close all threads and release all ldclient objects. I would expect my next ldclient.get().is_initialized() to be false because I closed ldclient so it should be in a state where I have to call set_config

I expected ldclient.set_config(ld_config) to be smart enough to only create new instances if there doesn't exist an active instance or if the config values changed from the previous call. I know that ldclient.get() enforces the singleton pattern, then I would expect `ldclient.set_config() to as well.

Maybe there should be something like a get_config function to allow me to verify that the settings in code are the same as the active config and then I could re-use the existing ldclient instead of creating a new one.

Logs

builtins.RuntimeError: can't start new thread
Traceback (most recent call last):
<removed>
File /function/ldclient/__init__.py, line 42, in set_config
	{code_line}
File /function/ldclient/client.py, line 97, in __init__
	{code_line}
File /function/ldclient/client.py, line 120, in _set_event_processor
	{code_line}
File /function/ldclient/event_processor.py, line 396, in __init__
	{code_line}
File /function/ldclient/repeating_timer.py, line 17, in start
	{code_line}
File /usr/local/lib/python3.7/threading.py, line 852, in start
	{code_line}
RuntimeError: can't start new thread

SDK version We updated to version 7.5.1 and still saw this error. We tried v8.0.0 as well, same issue.

Language version, developer tools Python 3.7

OS/platform Docker container running python:3.7-slim and Ubuntu 20.04

Additional context

agitelzon avatar Jan 05 '23 17:01 agitelzon

It looks to me like this is probably a very long-standing problem with the implementation of close(). I think the thread that is not being stopped is the one that reads the flag data SSE stream. Since the server has not closed the connection, this thread is blocked waiting for more data. If I'm right, then using the SDK in polling mode (set stream=False in the configuration), so that the request does not stay open, would mean you would not see any lingering threads after calling close(). That wouldn't be a long-term solution, just a way to test the theory.

The obvious solution would be to force a close of the connection from the client side, but I've been having trouble finding a workable way to do this— urllib3's HTTPResponse.close() appears to simply hang if you try to call it from thread A (in this case the application thread that's trying to close the LDClient) while thread B (the SSE reader) is doing a blocking read. I'm still investigating what the right way is to do this; urllib3 documentation and Python documentation are not very clear on this subject since it's a somewhat unusual use case.

eli-darkly avatar Jan 05 '23 22:01 eli-darkly

We tried stream=False and it seemed to not help.

agitelzon avatar Jan 06 '23 19:01 agitelzon

@agitelzon Hmm— that's unexpected, so maybe there is something else going on. We're continuing to investigate.

eli-darkly avatar Jan 06 '23 20:01 eli-darkly

Can you tell me exactly how many excess threads you are seeing?

eli-darkly avatar Jan 06 '23 20:01 eli-darkly

Also— I missed one detail in your original post that I should have commented on. You said:

I expected ldclient.set_config(ld_config) to be smart enough to only create new instances if there doesn't exist an active instance or if the config values changed from the previous call. I know that ldclient.get() enforces the singleton pattern, then I would expect ldclient.set_config() to as well.

That is not the documented behavior of set_config. Please see the documentation:

Sets the configuration for the shared SDK client instance. If this is called prior to ldclient.get(), it stores the configuration that will be used when the client is initialized. If it is called after the client has already been initialized, the client will be re-initialized with the new configuration (this will result in the next call to ldclient.get() returning a new client instance).

In other words, if you only want the client to be created once, and the configuration has not changed, you should not be calling set_config more than once.

eli-darkly avatar Jan 06 '23 20:01 eli-darkly

Can you tell me exactly how many excess threads you are seeing?

Every time a new job would run in the code, we'd see 11 new threads created, 9 threads shutdown, and 2 threads would stick around forever. The next time a job would run, again 11 new, 9 closed, 2 left over.

This would keep happening until the application ran out of threads in the system.

In other words, if you only want the client to be created once, and the configuration has not changed, you should not be calling set_config more than once.

For now, our workaround was this block of code at the beginning of the __set_launchdarkly_configuration function.

    try:
        if ldclient.get().is_initialized():
            logger.info("LAUNCHDARKLY already initialized")
            return
    except Exception as e:
        logger.info("LAUNCHDARKLY not initialized yet. Setting config...")
        pass

agitelzon avatar Jan 07 '23 05:01 agitelzon

Thanks. If it's 5 threads, then my original guess about what was going on was wrong, but that does give us a clue as to what else to look into.

eli-darkly avatar Jan 09 '23 17:01 eli-darkly

Sorry if this is a silly question, but... in the first code excerpt you posted, is it for sure that the ldclient.get().close() is happening before the teardown_launchdarkly_configuration method runs and prints the thread count? I'm not sure how atexit hooks are ordered when one of them is from a method annotation.

eli-darkly avatar Jan 09 '23 18:01 eli-darkly

Thanks. If it's 5 threads, then my original guess about what was going on was wrong, but that does give us a clue as to what else to look into.

Please take a look at my post again. I went back through our logs and got an accurate count and updated my post.

Sorry if this is a silly question, but... in the first code excerpt you posted, is it for sure that the ldclient.get().close() is happening before the teardown_launchdarkly_configuration method runs and prints the thread count? I'm not sure how atexit hooks are ordered when one of them is from a method annotation.

We went through a few iterations with testing the block of code and putting in print statements. I just grabbed a simplified version to post here. At first, we had the ldclient.get().close() statement in the function block with print statements before and after, so we could be sure on the number of threads. Once we had a good idea that the culprit was ldclient, we removed most of the print statements to clean up the code.

I could update the sample code with more statements if that would be helpful.

agitelzon avatar Jan 09 '23 22:01 agitelzon

Is this still being looked into? Looks like the SDK still has this issue where closing the LD client won't close the background stream.

ns-saggarwal avatar Nov 29 '24 12:11 ns-saggarwal

Is this still being looked into? Looks like the SDK still has this issue where closing the LD client won't close the background stream.

Yes, we are currently in discussions with the urllib3 developers regarding this issue. We are working with them to extend some functionality to address this issue.

We will let you know as soon as we have this resolved!

keelerm84 avatar Dec 02 '24 17:12 keelerm84

@agitelzon and @ns-saggarwal that urllib3 developers released a fix for this issue, which we then adopted into our SDK.

To resolve this issue, you must have the following:

  • Python 3.9+
  • urllib3 2.3.0+
  • LD eventsource 1.2.1
  • LD SDK 9.8.1

Thank you for your patience while we worked through those issues.

keelerm84 avatar Jan 22 '25 20:01 keelerm84