omero-py icon indicating copy to clipboard operation
omero-py copied to clipboard

Prevent hang on exit while omero.client keepalive is active

Open DavidStirling opened this issue 1 year ago • 5 comments

When using omero.client best practice appears to be to have everything in a try block and explicitly call client.closeSession to shut everything down nicely, however users don't always remember to do this.

If a user has called client.enableKeepAlive(x) then exiting without calling closeSession currently seems to cause Python to hang instead of exiting.

Minimal example:

import omero
client = omero.client(host="my.omero.server")
session = client.createSession(username="user.name", password="my.pwd")
client.enableKeepAlive(5)
print("Session is active: ", client.getSessionId())
# A normal exit behaves similarly, but this is clearer in the console
raise Exception("A forced crash")

At least on my end, executing this script will throw the exception but fail to exit the interpreter.

Having investigated this, I believe that the task thread within the omero.util.Resources class fails to shut down when the main thread exits. The result is that, while the interpreter is waiting for all threads to close so that it can exit, the keepalive thread keeps spinning. It's currently set up to wait for an explicit close signal from the main thread, but this obviously never comes if that thread has stopped. You therefore get a hang.

As a quick fix, making these worker threads into daemon threads ensures that they die if the main thread exits, thus fixing the problem. I'm not sure how acceptable this is as the Resources class seems to be used for other repeating processes, but I couldn't find a scenario where these should also continue even in the absence of the main thread. If that's not the case then you could instead build some sort of "is the main thread alive?" check into the task.

It might actually be worth a broader review of omero-py's cleanup/shutdown routines. It looks like much of that code is rather old and so modern Python could have more effective ways to manage connections.

DavidStirling avatar Jul 31 '24 11:07 DavidStirling

It might actually be worth a broader review of omero-py's cleanup/shutdown routines. It looks like much of that code is rather old

Definitely! 😄

... and so modern Python could have more effective ways to manage connections.

:+1:

I'm not sure how acceptable this is as the Resources class seems to be used for other repeating processes, but I couldn't find a scenario where these should also continue even in the absence of the main thread.

Resources was originally built (IIRC) for use in the Processor service. A review of whether or not there's anything that those tasks need time to do might be worth it. Even if that is the case, though, we could certainly have this as a parameter to separate between server and client side.

joshmoore avatar Jul 31 '24 14:07 joshmoore

Having looked at this in a lot more detail I'm pretty confident this is just a bug fix. There is more detailed explanation here:

  • https://stackoverflow.com/questions/58910372/script-stuck-on-exit-when-using-atexit-to-terminate-threads
  • https://stackoverflow.com/questions/3713360/python-2-6-x-theading-signals-atexit-fail-on-some-versions/3765160#3765160
  • https://bugs.python.org/issue1722344

The summary is that our code that is in question here was written in the Python 2.4 era and sometime around Python 2.6.5 the semantics of how sys.exit is handled changed. This change resulted more or less in the standard patterns being applied to interpreter exit where they were not previously. The standard pattern for cleanup in the threading module is to join all non-daemon threads. Any thread that a Resources class spawns currently is not a daemon thread and must have the atexit handler triggered to be joinable. Consequently, we deadlock.

The only other place I see omero.util.Resources being used besides the client object keep alives is for instances of Servant. These too would currently deadlock if sys.exit were initiated but that doesn't happen as they are shut down explicitly. I'll check more thoroughly but I can't see a downside to daemon task thread being used there especially with the current implementation's reliance on an atexit handler.

chris-allan avatar Aug 02 '24 17:08 chris-allan

For reference, current uses of Servant are:

  • omero.processor.ProcessorI (uses Resources for session management)
  • omero.tables.TablesI (uses Resources for TableI servant, which uses SimpleServant without Resources, management)

chris-allan avatar Aug 05 '24 08:08 chris-allan

Detailed rationale for the logging changes in 0f8fe1cf7fd80f6bdaa7788a5f0b15dcb6d542ec. Future deprecation, refactoring and cleanup of omero.util.concurrency.AtExitEvent and omero.util.concurrency.get_event() should definitely be done; the code makes Python 2.4 era assumptions.

chris-allan avatar Aug 05 '24 09:08 chris-allan

Without this PR, calling client.enableKeepAlive(5) prevents interpreter from exiting with Ctrl-D. When this is forced with Ctrl-C I get...

(omeroweb2) Williams-MacBook-Pro:zarr-python wmoore$ python
Python 3.9.15 | packaged by conda-forge | (main, Nov 22 2022, 08:50:29) 
[Clang 14.0.6 ] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import omero
>>> client = omero.client(host="localhost")
>>> session = client.createSession(username="user", password="pass")
>>> client.enableKeepAlive(5)
>>> 
^CException ignored in: <module 'threading' from '/Users/wmoore/opt/anaconda3/envs/omeroweb2/lib/python3.9/threading.py'>
Traceback (most recent call last):
  File "/Users/wmoore/opt/anaconda3/envs/omeroweb2/lib/python3.9/threading.py", line 1477, in _shutdown
    lock.acquire()

Updating to this branch fixes that behaviour (clean exit with Ctrl-D).

👍

will-moore avatar Aug 05 '24 15:08 will-moore

Released in 5.19.5

jburel avatar Sep 16 '24 14:09 jburel