wrapt_timeout_decorator
wrapt_timeout_decorator copied to clipboard
Undesired behaviour when a wrapped function (with use_signals=False) gets killed (eg OOM)
Hey I would like to report possibly unintended behaviour, where the timeout waits for the desired time even though the started process is not alive anymore.
With use_signals=False the Timeout class starts a child process (for the decorated function), but does not monitor if it dies before the timeout. Instead the decorator waits for the given timeout time to expire, then raises the exception.
This could be the case when adding @timeout to memory intensive functions that could get killed by OOMKiller (especially when run in containers). This is of course a problem with any signal that cannot be trapped (ie SIGKILL, exit code 137).
I do not have a proposed solution, I tried some things but without success. I just wanted to let you know this.
Thanks Gianluca
Minimal (artificial) example
from wrapt_timeout_decorator import timeout
import multiprocessing
import time
import psutil
@timeout(
10,
use_signals=False,
timeout_exception=TimeoutError,
)
def slow_process():
# should have enough time to finish
# but instead it gets terminated, and the
print("Slow process started")
time.sleep(5)
print("Slow process done")
def fake_oomkiller():
print("OOMKiller started")
time.sleep(2)
# kill sibling slow_process
# hacky way to find it
target = psutil.Process().parent().children(recursive=True)[-1]
target.kill()
print(f"Killed {target.pid}")
if __name__ == "__main__":
oomkiller = multiprocessing.Process(target=fake_oomkiller, args=())
oomkiller.start()
slow_process()
oomkiller.join()
Which results in slow_process getting killed and the timeout expiring.
Slow process started
OOMKiller started
Killed 61205
Traceback (most recent call last):
[...]
TimeoutError: Function slow_process timed out after 10.0 seconds
Interesting. Will look into it later since I am on holiday now .....
@bitranox I'd like to know if you found out why is this happening. Thank you.
Yeah, there will be an easy solution - I will raise a ProcessError or similar in that case - just want to brush up also some other issues ...
Dear @gerkone @tejas-celonis @lodrantl , its now implemented. Subprocesses are checked every 5 seconds if they are still alive. see : subprocess-monitoring
yours sincerely
bitranox
@bitranox good to hear, thanks a lot.