wrapt_timeout_decorator icon indicating copy to clipboard operation
wrapt_timeout_decorator copied to clipboard

Undesired behaviour when a wrapped function (with use_signals=False) gets killed (eg OOM)

Open gerkone opened this issue 3 years ago • 3 comments

Hey I would like to report possibly unintended behaviour, where the timeout waits for the desired time even though the started process is not alive anymore.

With use_signals=False the Timeout class starts a child process (for the decorated function), but does not monitor if it dies before the timeout. Instead the decorator waits for the given timeout time to expire, then raises the exception. This could be the case when adding @timeout to memory intensive functions that could get killed by OOMKiller (especially when run in containers). This is of course a problem with any signal that cannot be trapped (ie SIGKILL, exit code 137).

I do not have a proposed solution, I tried some things but without success. I just wanted to let you know this.

Thanks Gianluca


Minimal (artificial) example

from wrapt_timeout_decorator import timeout
import multiprocessing
import time
import psutil


@timeout(
        10,
        use_signals=False,
        timeout_exception=TimeoutError,
    )
def slow_process():
    # should have enough time to finish
    # but instead it gets terminated, and the
    print("Slow process started")
    time.sleep(5)
    print("Slow process done")


def fake_oomkiller():
    print("OOMKiller started")
    time.sleep(2)
    # kill sibling slow_process
    # hacky way to find it
    target = psutil.Process().parent().children(recursive=True)[-1]
    target.kill()
    print(f"Killed {target.pid}")


if __name__ == "__main__":
    oomkiller = multiprocessing.Process(target=fake_oomkiller, args=())
    oomkiller.start()
    slow_process()
    oomkiller.join()

Which results in slow_process getting killed and the timeout expiring.

Slow process started
OOMKiller started
Killed 61205
Traceback (most recent call last):
[...]
TimeoutError: Function slow_process timed out after 10.0 seconds

gerkone avatar Jul 28 '22 06:07 gerkone

Interesting. Will look into it later since I am on holiday now .....

bitranox avatar Jul 28 '22 07:07 bitranox

@bitranox I'd like to know if you found out why is this happening. Thank you.

tejas-celonis avatar Sep 16 '22 12:09 tejas-celonis

Yeah, there will be an easy solution - I will raise a ProcessError or similar in that case - just want to brush up also some other issues ...

bitranox avatar Sep 16 '22 12:09 bitranox

Dear @gerkone @tejas-celonis @lodrantl , its now implemented. Subprocesses are checked every 5 seconds if they are still alive. see : subprocess-monitoring

yours sincerely

bitranox

bitranox avatar Jul 14 '23 17:07 bitranox

@bitranox good to hear, thanks a lot.

gerkone avatar Jul 14 '23 20:07 gerkone