bioscons icon indicating copy to clipboard operation
bioscons copied to clipboard

Add ensure_exists option to ensure filesystem is in sync before Command action returns

Open metasoarous opened this issue 11 years ago • 1 comments

Some of us have noticed that occasionally scons thinks that files have changed and need to rebuild, even when this should not be the case. This can be particularly annoying with long running jobs, or jobs with some degree of randomness, as this can lead to all downstream targets being rebuilt unnecessarily.

After some snooping around, I've discovered that this only seems to happen when running on the cluster, and specifically seems to be related to the parental scons process not seeing the changes to the file(system), and conseuquently reading an incorrect (presumably null) MD5 hash.

This problem can be solved by appending appending an action to the end of the command string that ensures that the file exists before returning. The ideal solution would require that a flag be set on SlurmEnvironment to turn on this behavior if desired, defaulting to the current behavior otherwise. It should also be possible to turn this on or off on a specific Command, as well as specify the max wait time.

metasoarous avatar Feb 19 '14 22:02 metasoarous

Minimal reproducible test case -

tgt1 = env.Command(path.join(outdir, 'first.csv'), input,
        'csvcut -C other $SOURCE > $TARGET', use_cluster=False)
tgt2 = env.Command(path.join(outdir, 'second.csv'), tgt1,
        'csvsort -c this $SOURCE > $TARGET', use_cluster=True)
tgt3 = env.Command(path.join(outdir, 'third.csv'), [tgt2, input],
        'csvjoin -c this $SOURCES > $TARGET', use_cluster=False)

Note that running scons on this Sconscript, followed by scons --debug explain -n upon completion leads to a message indicating that the target has changed and needs to be rebuilt.

metasoarous avatar Feb 19 '14 22:02 metasoarous