pycbc icon indicating copy to clipboard operation
pycbc copied to clipboard

checkpoint to local disk then copy to shared location on eviction / job removal

Open ahnitz opened this issue 3 years ago • 0 comments

One reason we hesitate with some of the current checkpointing for PE / inspiral jobs when running O(10^3) or greater is that they often are set up to write to a shared filesystem location. This is less of an issue when running in nonsharedfs mode with Pegasus, notably. If we need to allow a shared fs, we could alternatively allow the checkpoint to be written directly to a local disk location (say tmpdir) and then copy back only on eviction by use of a handler.

Should try the pegasus solution if it works though for us https://pegasus.isi.edu/documentation/user-guide/data-transfers.html#staging-of-job-checkpoint-files

ahnitz avatar Jun 25 '21 13:06 ahnitz