bpipe
bpipe copied to clipboard
docs on how bpipe handles dependencies
I've read through several parts of the documentation and tutorial, but couldn't find a description of how dependencies are handled. If an input file is changed, does bpipe rerun that pipeline step by comparing time stamps of input and output files? If a file containing a script is changed, does bpipe do any checking for that?
Sorry if I missed them, but a pointer to a description of this would be great if they exist.
From some small tests, it uses file modification times at least.
It does use file modification timestamps, although because the modification times on some file systems are quite coarse (commonly 1 second resolution or even greater) it actually stores the time the command completes at millisecond resolution in the properties file for each output which is found in .bpipe/outputs. When the properties file exists it will use that if the file system timestamp is ambiguous.
It doesn't, by default, check whether the command or any of the scripts or executables in the command changed. There's an experimental feature that checksums the command itself and re-executes it if the checksum is different. I ran with this for a little while but I found, at least in my case, that it was quite painful and I frequently ended up with massive parts of pipelines re-executing that didn't need to be because of the trivial changes in the commands. Since it has not really had very much testing I consider it experimental, but I hope to fully implement it in the future, as I think it is a useful option for mature / production pipelines.
Cheers,
Simon
On Fri, Nov 13, 2015 at 3:35 AM, Gabriel A. Devenyi < [email protected]> wrote:
From some small tests, it uses file modification times at least.
— Reply to this email directly or view it on GitHub https://github.com/ssadedin/bpipe/issues/157#issuecomment-156158763.
I'm very much in favour of the md5sum method for the commands, as it helps with consistency and reproducibility of the pipeline.
Allows me to ensure "did I actually recompute this pipeline after tweaking it"