easybuild-framework icon indicating copy to clipboard operation
easybuild-framework copied to clipboard

Proposed changest to add a `timeout` to `run_shell_cmd`

Open Crivella opened this issue 1 year ago • 4 comments

This changes would implement the capability for run_shell_cmd to fail after a user-specified timeout. This would be useful for commands that are known to possibly hang to fail gracefully and report meaningful logs, instead of blocking EB indefinitely.

The PR leverages the already implemented timeout in Popen.communicate (since python 3.3) for non-interactive/streamed usage of run_shell_cmd

In case of an interactive/streamed run the following has been added:

  • 2 functions (+1 helper) in run.py
    • read_pipe: read from a pipe using a thread and raise a TimeoutError in case the read operation is taking longer than the specified timeout
    • terminate_process: attempt to terminate a process gracefully by using Popen.terminate first and Popen.kill after. Raises an EasybuildError if the process is still alive after timeout
  • 3 checks inside run_shell_cmd
    • Check at the beginning of the while loop to see if the process has been alive for more than timeout time
    • Check when reading the stdout to ensure the operation does not block for more than what would make the process time exceed timeout
    • Check when reading the stderr to ensure the operation does not block for more than what would make the process time exceed timeout
  • 2 tests to ensure that both usages succeds/fails when expected.

Crivella avatar Oct 01 '24 13:10 Crivella