Thespian
Thespian copied to clipboard
MultiProcessQueue crashes on KeyboardInterrupt
https://github.com/thespianpy/Thespian/blob/2341eae2928a861820e9d71c3f7974bee3e34388/thespian/system/transport/MultiprocessQueueTransport.py#L211-L224
After a KeyboardInterrupt
it's not possible to get Actors from the actor system or send messages to it, so the system can't be shut down properly any more.
Stacktrace
Process SpawnProcess-1:1:
Process ThespianAdmin:
Traceback (most recent call last):
File "D:\python\python37\lib\multiprocessing\process.py", line 297, in _bootstrap
self.run()
File "D:\python\python37\lib\multiprocessing\process.py", line 99, in run
self._target(*self._args, **self._kwargs)
File "D:\~\thespian\system\multiprocCommon.py", line 210, in startAdmin
admin.run()
File "D:\~\thespian\system\admin\convention.py", line 758, in run
delay.remaining())
File "D:\~\thespian\system\transport\wakeupTransportBase.py", line 71, in run
rval = self._run_subtransport(incomingHandler, max_runtime)
File "D:\~\thespian\system\transport\wakeupTransportBase.py", line 80, in _run_subtransport
rval = self._runWithExpiry(incomingHandler)
File "D:\~\thespian\system\transport\MultiprocessQueueTransport.py", line 697, in _runWithExpiry
lambda s=self: s.run_time)
File "D:\~\thespian\system\transport\MultiprocessQueueTransport.py", line 402, in core_receive
return self.core_common_receive(incoming_handler, my_address, run_time_f)[1]
File "D:\~\thespian\system\transport\MultiprocessQueueTransport.py", line 218, in core_common_receive
QUEUE_CHECK_PERIOD))
File "D:\python\python37\lib\multiprocessing\queues.py", line 104, in get
if not self._poll(timeout):
File "D:\python\python37\lib\multiprocessing\connection.py", line 257, in poll
return self._poll(timeout)
File "D:\python\python37\lib\multiprocessing\connection.py", line 330, in _poll
return bool(wait([self], timeout))
File "D:\python\python37\lib\multiprocessing\connection.py", line 869, in wait
ready_handles = _exhaustive_wait(waithandle_to_obj.keys(), timeout)
File "D:\python\python37\lib\multiprocessing\connection.py", line 801, in _exhaustive_wait
res = _winapi.WaitForMultipleObjects(L, False, timeout)
KeyboardInterrupt
Possible solution:
add
except KeyboardInterrupt:
continue
to ignore KeyboardInterrupts and keep the system running, so the thespian user can shut down the system properly. This change worked for me. Not sure if thespian would have to do any other stuff to do in this case or if there is a better place to catch/avoid the Exception.
For the multiprocQueueBase that you are using, I would prefer to leave process-global issues like the handling of KeyboardInterrupt up to the application, since this is a policy decision that could be addressed in multiple different ways depending on the application's preference.
For the typical case, I'd recommend a try/finally block in your application:
asys = ActorSystem.ActorSystem(base='multiprocQueueBase', ...)
try:
[... main code here ...]
finally:
asys.shutdown()
I believe this should be sufficient to allow a normal Thespian shutdown on a keyboard interrupt.
If this is satisfactory, please go ahead and close this issue, but if I've misunderstood or there are additional concerns, please feel free to follow up here.
I 100% agree that the application should handle the KeyboardInterrupt. That's why I opened this issue.
I use thespian behind FastAPI, and my code for startup and shutdown looks something like this:
app = FastAPI()
@app.on_event('startup')
def on_startup():
asys = ActorSystem(...)
logger = asys.createActor(..., globalName='logger')
asys.tell(logger, LoggerInitializeMessage(...))
@app.on_event('shutdown')
def on_shutdown():
asys = ActorSystem()
logger = asys.createActor(..., globalName='logger') # never returns.
asys.shutdown() # times out after 10 seconds. ActorExitRequest() never reach the Actors
A KeyboardInterrupt triggers FastAPIs shutdown event, but before on_shutdown()
is executed, MultiProcessQueue crashes because of the very same KeyboardInterrupt. This has the consequence that I can't shut down the ActorSystem properly, because messages within the ActorSystem don't seem to reach their targets any more.
With the suggested fix on_shutdown()
works just as I would expect it to.
Sorry, I'm not clear on "With the suggested fix ...". Did you mean the fix I suggested, or the fix you suggested in your original message, and if the latter, where was it that you inserted that fix?
Also, can you provide a thespian.log
(see Section 13.3 in https://thespianpy.com/doc/using.html#hH-ce55494c-dd7a-4258-a1e8-b090c3bbb1e6) from a short run where you Ctrl-C immediately after startup but without processing any traffic?
I was referring to the fix I suggested.
Here is the thespian.log. I only snipped some file paths:
2023-01-27 10:26:15.873589 p27796 Warn Actor my_project.actors.LoggerAgent @ ActorAddr-Q.ThespianQ.b got signal: 2
2023-01-27 10:26:15.931655 p27796 ERR Actor my_project.actors.LoggerAgent @ ActorAddr-Q.ThespianQ.b transport run exception: Traceback (most recent call last):
File "D:\~\.venv37\lib\site-packages\thespian\system\actorManager.py", line 88, in run
r = self.transport.run(self.handleMessages)
File "D:\~\.venv37\lib\site-packages\thespian\system\transport\wakeupTransportBase.py", line 71, in run
rval = self._run_subtransport(incomingHandler, max_runtime)
File "D:\~\.venv37\lib\site-packages\thespian\system\transport\wakeupTransportBase.py", line 80, in _run_subtransport
rval = self._runWithExpiry(incomingHandler)
File "D:\~\.venv37\lib\site-packages\thespian\system\transport\MultiprocessQueueTransport.py", line 697, in _runWithExpiry
lambda s=self: s.run_time)
File "D:\~\.venv37\lib\site-packages\thespian\system\transport\MultiprocessQueueTransport.py", line 402, in core_receive
return self.core_common_receive(incoming_handler, my_address, run_time_f)[1]
File "D:\~\.venv37\lib\site-packages\thespian\system\transport\MultiprocessQueueTransport.py", line 218, in core_common_receive
QUEUE_CHECK_PERIOD))
File "D:\python\python37\lib\multiprocessing\queues.py", line 104, in get
if not self._poll(timeout):
File "D:\python\python37\lib\multiprocessing\connection.py", line 257, in poll
return self._poll(timeout)
File "D:\python\python37\lib\multiprocessing\connection.py", line 330, in _poll
return bool(wait([self], timeout))
File "D:\python\python37\lib\multiprocessing\connection.py", line 869, in wait
ready_handles = _exhaustive_wait(waithandle_to_obj.keys(), timeout)
File "D:\python\python37\lib\multiprocessing\connection.py", line 801, in _exhaustive_wait
res = _winapi.WaitForMultipleObjects(L, False, timeout)
InterruptedError: [Errno 4] Interrupted function call
2023-01-27 10:26:25.959071 p39380 ERR No response to Admin shutdown request; Actor system not completely shutdown
Now with log level DEBUG:
2023-01-27 12:07:21.788259 p42676 dbg ++++ Starting Admin from D:\~\.venv37\lib\site-packages\thespian\__init__.py
2023-01-27 12:07:21.789777 p42676 I ++++ Admin started @ ActorAddr-Q.ThespianQ / gen (3, 10)
2023-01-27 12:07:22.674982 p42676 dbg Admin of ReceiveEnvelope(from: ActorAddr-Q.ThespianQ.a, <class 'thespian.system.messages.multiproc.LoggerConnected'> msg: <thespian.system.messages.multiproc.LoggerConnected object at 0x000002202D084948>)
2023-01-27 12:07:22.676050 p42676 dbg actualTransmit of TransportIntent(ActorAddr-Q.~-pending-ExpiresIn_0:04:59.999896-<class 'thespian.system.messages.multiproc.EndpointConnected'>-<thespian.system.messages.multiproc.EndpointConnected object at 0x000002202D0849C8>-quit_0:04:59.999871)
2023-01-27 12:07:22.679150 p31436 dbg actualTransmit of TransportIntent(ActorAddr-Q.ThespianQ-pending-ExpiresIn_0:04:59.999918-<class 'thespian.system.messages.admin.QueryExists'>-<thespian.system.messages.admin.QueryExists object at 0x0000018DB00C2E08>-quit_0:04:59.999898)
2023-01-27 12:07:22.680112 p42676 dbg Admin of ReceiveEnvelope(from: ActorAddr-Q.~, <class 'thespian.system.messages.admin.QueryExists'> msg: <thespian.system.messages.admin.QueryExists object at 0x000002202D084D48>)
2023-01-27 12:07:22.680112 p42676 dbg Attempting intent TransportIntent(ActorAddr-Q.~-pending-ExpiresIn_0:04:59.999978-<class 'thespian.system.messages.admin.QueryAck'>-<thespian.system.messages.admin.QueryAck object at 0x000002202D0848C8>-quit_0:04:59.999960)
2023-01-27 12:07:22.681064 p42676 dbg actualTransmit of TransportIntent(ActorAddr-Q.~-pending-ExpiresIn_0:04:59.999542-<class 'thespian.system.messages.admin.QueryAck'>-<thespian.system.messages.admin.QueryAck object at 0x000002202D0848C8>-quit_0:04:59.999529)
2023-01-27 12:07:22.682124 p31436 dbg actualTransmit of TransportIntent(ActorAddr-Q.ThespianQ-pending-ExpiresIn_0:04:59.999921-<class 'thespian.system.messages.admin.PendingActor'>-PendingActor#1_of_Noneis"process_logger"-quit_0:04:59.999901)
2023-01-27 12:07:22.683116 p42676 dbg Admin of ReceiveEnvelope(from: ActorAddr-Q.~1, <class 'thespian.system.messages.admin.PendingActor'> msg: PendingActor#1_of_Noneis"process_logger")
2023-01-27 12:07:22.683116 p42676 I Pending Actor request received for my_project.actors.logger.actor.LoggerAgent reqs None from ActorAddr-Q.~1
2023-01-27 12:07:23.474723 p45660 I Starting Actor my_project.actors.logger.actor.LoggerAgent at ActorAddr-Q.ThespianQ.b (parent ActorAddr-Q.ThespianQ, admin ActorAddr-Q.ThespianQ, srcHash None)
2023-01-27 12:07:23.475666 p45660 dbg actualTransmit of TransportIntent(ActorAddr-Q.ThespianQ-pending-ExpiresIn_0:04:59.999909-<class 'thespian.system.messages.multiproc.EndpointConnected'>-<thespian.system.messages.multiproc.EndpointConnected object at 0x000001A71D7CE808>-quit_0:04:59.999885)
2023-01-27 12:07:23.476686 p42676 dbg Attempting intent TransportIntent(ActorAddr-Q.~1-pending-ExpiresIn_0:04:59.999977-<class 'thespian.system.messages.admin.PendingActorResponse'>-PendingActorResponse(for None inst# 1) errCode None actual ActorAddr-Q.ThespianQ.b-quit_0:04:59.999959)
2023-01-27 12:07:23.477749 p45660 dbg actualTransmit of TransportIntent(ActorAddr-Q.ThespianQ.a-pending-ExpiresIn_0:04:59.999903-<class 'logging.LogRecord'>-<LogRecord: my_project.actors.logger.actor, 10, D:\~\my_project\actors\logger\actor.py, 55, "Logg...-quit_0:04:59.999887)
2023-01-27 12:07:23.477749 p42676 dbg actualTransmit of TransportIntent(ActorAddr-Q.~1-pending-ExpiresIn_0:04:59.999017-<class 'thespian.system.messages.admin.PendingActorResponse'>-PendingActorResponse(for None inst# 1) errCode None actual ActorAddr-Q.ThespianQ.b-quit_0:04:59.998997)
2023-01-27 12:07:23.478738 p31436 dbg actualTransmit of TransportIntent(ActorAddr-Q.ThespianQ.b-pending-ExpiresIn_0:04:59.999756-<class 'my_project.actors.logger.messages.ConfigMessage'>-ConfigMessage(config=<my_project.core.config.Config object at 0x0000018DADB5B508>)-quit_0:04:59.999742)
2023-01-27 12:07:23.479708 p31436 dbg actualTransmit of TransportIntent(ActorAddr-Q.ThespianQ-pending-ExpiresIn_0:04:59.999951-<class 'thespian.system.messages.admin.PendingActor'>-PendingActor#2_of_Noneis"process_logger"-quit_0:04:59.999935)
2023-01-27 12:07:23.480759 p42676 dbg Admin of ReceiveEnvelope(from: ActorAddr-Q.~2, <class 'thespian.system.messages.admin.PendingActor'> msg: PendingActor#2_of_Noneis"process_logger")
2023-01-27 12:07:23.481766 p42676 dbg Attempting intent TransportIntent(ActorAddr-Q.~2-pending-ExpiresIn_0:04:59.999987-<class 'thespian.system.messages.admin.PendingActorResponse'>-PendingActorResponse(for None inst# 2) errCode None actual ActorAddr-Q.ThespianQ.b-quit_0:04:59.999974)
2023-01-27 12:07:23.481766 p45660 dbg ACTOR got ReceiveEnvelope(from: ActorAddr-Q.~, <class 'my_project.actors.logger.messages.ConfigMessage'> msg: ConfigMessage(config=<my_project.core.config.Config object at 0x000001A7220FC2C8>))
2023-01-27 12:07:23.482695 p42676 dbg actualTransmit of TransportIntent(ActorAddr-Q.~2-pending-ExpiresIn_0:04:59.999428-<class 'thespian.system.messages.admin.PendingActorResponse'>-PendingActorResponse(for None inst# 2) errCode None actual ActorAddr-Q.ThespianQ.b-quit_0:04:59.999418)
2023-01-27 12:07:23.482695 p45660 dbg actualTransmit of TransportIntent(ActorAddr-Q.ThespianQ.a-pending-ExpiresIn_0:04:59.999929-<class 'logging.LogRecord'>-<LogRecord: my_project.actors.logger.actor, 30, D:\~\my_project\actors\logger\actor.py, 152, "Log...-quit_0:04:59.999908)
2023-01-27 12:07:23.483696 p45660 dbg actualTransmit of TransportIntent(ActorAddr-Q.ThespianQ.a-pending-ExpiresIn_0:04:59.999963-<class 'logging.LogRecord'>-<LogRecord: my_project.actors.logger.actor, 10, D:\~\my_project\actors\logger\actor.py, 62, "Logg...-quit_0:04:59.999952)
2023-01-27 12:07:23.483696 p31436 dbg actualTransmit of TransportIntent(ActorAddr-Q.ThespianQ.b-pending-ExpiresIn_0:04:59.999753-<class 'my_project.actors.logger.messages.ConfigMessage'>-ConfigMessage(config=<my_project.core.config.Config object at 0x0000018DADB5B508>)-quit_0:04:59.999734)
2023-01-27 12:07:23.485704 p45660 dbg ACTOR got ReceiveEnvelope(from: ActorAddr-Q.~, <class 'my_project.actors.logger.messages.ConfigMessage'> msg: ConfigMessage(config=<my_project.core.config.Config object at 0x000001A7220FC288>))
2023-01-27 12:07:23.485704 p45660 dbg actualTransmit of TransportIntent(ActorAddr-Q.ThespianQ.a-pending-ExpiresIn_0:04:59.999951-<class 'logging.LogRecord'>-<LogRecord: my_project.actors.logger.actor, 30, D:\~\my_project\actors\logger\actor.py, 152, "Log...-quit_0:04:59.999938)
2023-01-27 12:07:28.832438 p45660 Warn Actor my_project.actors.logger.actor.LoggerAgent @ ActorAddr-Q.ThespianQ.b got signal: 2
2023-01-27 12:07:28.897457 p45660 ERR Actor my_project.actors.logger.actor.LoggerAgent @ ActorAddr-Q.ThespianQ.b transport run exception: Traceback (most recent call last):
File "D:\~\.venv37\lib\site-packages\thespian\system\actorManager.py", line 88, in run
r = self.transport.run(self.handleMessages)
File "D:\~\.venv37\lib\site-packages\thespian\system\transport\wakeupTransportBase.py", line 71, in run
rval = self._run_subtransport(incomingHandler, max_runtime)
File "D:\~\.venv37\lib\site-packages\thespian\system\transport\wakeupTransportBase.py", line 80, in _run_subtransport
rval = self._runWithExpiry(incomingHandler)
File "D:\~\.venv37\lib\site-packages\thespian\system\transport\MultiprocessQueueTransport.py", line 697, in _runWithExpiry
lambda s=self: s.run_time)
File "D:\~\.venv37\lib\site-packages\thespian\system\transport\MultiprocessQueueTransport.py", line 402, in core_receive
return self.core_common_receive(incoming_handler, my_address, run_time_f)[1]
File "D:\~\.venv37\lib\site-packages\thespian\system\transport\MultiprocessQueueTransport.py", line 218, in core_common_receive
QUEUE_CHECK_PERIOD))
File "D:\python\python37\lib\multiprocessing\queues.py", line 104, in get
if not self._poll(timeout):
File "D:\python\python37\lib\multiprocessing\connection.py", line 257, in poll
return self._poll(timeout)
File "D:\python\python37\lib\multiprocessing\connection.py", line 330, in _poll
return bool(wait([self], timeout))
File "D:\python\python37\lib\multiprocessing\connection.py", line 869, in wait
ready_handles = _exhaustive_wait(waithandle_to_obj.keys(), timeout)
File "D:\python\python37\lib\multiprocessing\connection.py", line 801, in _exhaustive_wait
res = _winapi.WaitForMultipleObjects(L, False, timeout)
InterruptedError: [Errno 4] Interrupted function call
2023-01-27 12:07:28.898457 p45660 dbg Attempting intent TransportIntent(ActorAddr-Q.ThespianQ-pending-ExpiresIn_0:04:59.999974-<class 'thespian.actors.ChildActorExited'>-ChildActorExited:ActorAddr-Q.ThespianQ.b-quit_0:04:59.999950)
2023-01-27 12:07:28.899457 p45660 dbg actualTransmit of TransportIntent(ActorAddr-Q.ThespianQ-pending-ExpiresIn_0:04:59.998867-<class 'thespian.actors.ChildActorExited'>-ChildActorExited:ActorAddr-Q.ThespianQ.b-quit_0:04:59.998849)
2023-01-27 12:07:28.899457 p45660 dbg Run my_project.actors.logger.actor.LoggerAgent done
2023-01-27 12:07:29.541332 p31436 dbg actualTransmit of TransportIntent(ActorAddr-/ActorSys-pending-ExpiresIn_0:04:59.999935-<class 'thespian.actors.ActorExitRequest'>-ActorExitRequest-quit_0:04:59.999918)
2023-01-27 12:07:29.541332 p31436 I ActorSystem shutdown requested.
2023-01-27 12:07:29.542286 p31436 dbg actualTransmit of TransportIntent(ActorAddr-Q.ThespianQ-pending-ExpiresIn_0:04:59.999943-<class 'thespian.system.messages.admin.SystemShutdown'>-<thespian.system.messages.admin.SystemShutdown object at 0x0000018DB03DBDC8>-quit_0:04:59.999929)
2023-01-27 12:07:39.544817 p31436 ERR No response to Admin shutdown request; Actor system not completely shutdown
2023-01-27 12:07:39.546833 p31436 I ActorSystem shutdown complete.