page-lab icon indicating copy to clipboard operation
page-lab copied to clipboard

Monitor # of workers and restart 'failed' workers

Open ecumike opened this issue 6 years ago • 0 comments

Currently, if a worker fails for some reason (an uncaught exception or whatever) it does not re-spawn. Thus over time the # of concurrent workers has the potential to slowly decrease.

For instance you might start with 12 concurrent workers running tests, but after hours of continuous test running you might end up with only 3 running.

This requires these features:

  • Log when a worker fails (helps determine a pattern of cause).
  • Ability to monitor/view # of currently running workers (so you can verify the # running).
  • Ability to re-spawn a worker if one fails/shuts-down (this keeps the concurrent # of workers steady)

ecumike avatar Nov 02 '18 00:11 ecumike