Ensemble icon indicating copy to clipboard operation
Ensemble copied to clipboard

Agent Restore State

Open DotNetRussell opened this issue 2 years ago • 0 comments

Currently if an agent dies mid job, the agent will not resume the job when the agent restarts. This is going to take some investigation to figure out a way to do this.

  • Agents start a subprocess, wait for the process to finish, and then dump standard out to a temp file which is then processed and returned when requested. The issue with state is that if the agent dies mid job, the job still continues but now no longer belongs to the agent. So if the agent restarts there's no way for the agent to retrieve the output.

One possible solution to this could be running all commands with "tee" and then dumping the results to a results file named with the job id as it runs. Then when the agent restarts it could check some flat file that contains running job ids and retrieve the job results

Another possible solution (untested) is to track the pid of the process and then continuing to read from STDOUT by tailing /proc/pid/fd/1 but I believe this won't continue to capture the output between the restart and the tail. This is probably a backup solution

DotNetRussell avatar Aug 31 '23 13:08 DotNetRussell