pycbc
pycbc copied to clipboard
checkpoint to local disk then copy to shared location on eviction / job removal
One reason we hesitate with some of the current checkpointing for PE / inspiral jobs when running O(10^3) or greater is that they often are set up to write to a shared filesystem location. This is less of an issue when running in nonsharedfs mode with Pegasus, notably. If we need to allow a shared fs, we could alternatively allow the checkpoint to be written directly to a local disk location (say tmpdir) and then copy back only on eviction by use of a handler.
Should try the pegasus solution if it works though for us https://pegasus.isi.edu/documentation/user-guide/data-transfers.html#staging-of-job-checkpoint-files