signac-flow icon indicating copy to clipboard operation
signac-flow copied to clipboard

adding potential 'Corrupted' state for job in status check and allow submit for unaffected jobs

Open zhou-pj opened this issue 4 years ago • 3 comments

Feature description

This suggestion was mentioned in a discussion with @lyrivera regarding a recent incident of a /scratch file system breakdown where some files in my workspace is not accessible. When I do python project.py status or python project.py submit, they will all fail if any job files is not accessible. It would be great if we have something similar to the project.check() in signac incorporated here so that status check can continue and label those CORRUPTED, and the submit process can also pick the unaffected ones and continue to work.

Additional context

The related /scratch system incident that sparked this need: https://portal.tacc.utexas.edu/user-news/-/news/103216

zhou-pj avatar Feb 28 '20 16:02 zhou-pj

I'd like to contribute to this once a decision has been made on how to proceed.

lyrivera avatar Mar 02 '20 20:03 lyrivera

@lyrivera I don't think anything should stop you and @zhou-pj from going ahead with this. Please feel free to either propose a more detailed plan on how to achieve this that can be discussed on this issue or provide a draft implementation directly.

csadorf avatar Mar 03 '20 07:03 csadorf

@csadorf Thanks, we will discuss this further and come up with a plan.

lyrivera avatar Mar 03 '20 14:03 lyrivera