cbrain
cbrain copied to clipboard
Automatic restarting of task that are in the failed state - janitor
As we have now been moving to running large datasets at once and there is a greater potential of a job failing due to factors unrelated to CBRAIN (e.g. machine failure), it would be good to give users the ability to set that their tasks get automatically restarted on a failure.
Questions:
- What failed state should this be dependent on?
- Should we let the users set the number of times it tries to restart or should we hard code it?