bpipe icon indicating copy to clipboard operation
bpipe copied to clipboard

docs on how bpipe handles dependencies

Open stephens999 opened this issue 9 years ago • 3 comments

I've read through several parts of the documentation and tutorial, but couldn't find a description of how dependencies are handled. If an input file is changed, does bpipe rerun that pipeline step by comparing time stamps of input and output files? If a file containing a script is changed, does bpipe do any checking for that?

Sorry if I missed them, but a pointer to a description of this would be great if they exist.

stephens999 avatar Oct 11 '15 13:10 stephens999

From some small tests, it uses file modification times at least.

gdevenyi avatar Nov 12 '15 16:11 gdevenyi

It does use file modification timestamps, although because the modification times on some file systems are quite coarse (commonly 1 second resolution or even greater) it actually stores the time the command completes at millisecond resolution in the properties file for each output which is found in .bpipe/outputs. When the properties file exists it will use that if the file system timestamp is ambiguous.

It doesn't, by default, check whether the command or any of the scripts or executables in the command changed. There's an experimental feature that checksums the command itself and re-executes it if the checksum is different. I ran with this for a little while but I found, at least in my case, that it was quite painful and I frequently ended up with massive parts of pipelines re-executing that didn't need to be because of the trivial changes in the commands. Since it has not really had very much testing I consider it experimental, but I hope to fully implement it in the future, as I think it is a useful option for mature / production pipelines.

Cheers,

Simon

On Fri, Nov 13, 2015 at 3:35 AM, Gabriel A. Devenyi < [email protected]> wrote:

From some small tests, it uses file modification times at least.

— Reply to this email directly or view it on GitHub https://github.com/ssadedin/bpipe/issues/157#issuecomment-156158763.

ssadedin avatar Nov 12 '15 23:11 ssadedin

I'm very much in favour of the md5sum method for the commands, as it helps with consistency and reproducibility of the pipeline.

Allows me to ensure "did I actually recompute this pipeline after tweaking it"

gdevenyi avatar Jan 21 '16 21:01 gdevenyi