page-lab
page-lab copied to clipboard
Monitor # of workers and restart 'failed' workers
Currently, if a worker fails for some reason (an uncaught exception or whatever) it does not re-spawn. Thus over time the # of concurrent workers has the potential to slowly decrease.
For instance you might start with 12 concurrent workers running tests, but after hours of continuous test running you might end up with only 3 running.
This requires these features:
- Log when a worker fails (helps determine a pattern of cause).
- Ability to monitor/view # of currently running workers (so you can verify the # running).
- Ability to re-spawn a worker if one fails/shuts-down (this keeps the concurrent # of workers steady)