Rohan
Rohan
@MikeDacre: There's a `--signal` option for SBATCH that'll ensure that SLURM sends a specified signal at a given point before the job end time. See: https://computing.llnl.gov/linux/slurm/sbatch.html. You could use this...
@MikeDacre: A (possibly) simpler alternative is to write a simple script (Python/Bash?) that starts a timer when a job starts running and calls `dmtcp_command --checkpoint --port= --host=` when the timer...
@MikeDacre: Please see PR #327. If you aren't running parallel processes, it might help. > The timer idea is a good one, but it doesn't handle cases where the job...
@MikeDacre: Sorry for the delay on this. I'm a little busy with several paper deadlines this month. > I still think that this feature should be added to dmtcp_launch as...
The plugin, by default, catches signal # 36 (or `SIGRTMIN+2`; see: https://github.com/dmtcp/dmtcp/pull/327/files#diff-89f3a9b97b333fa35185f62a8a1471fbR6). You need to load the plugin using the `--with-plugin` option; ensure that the plugin has been built by...
@heroxbd Do you want the program to exit after saving a checkpoint? Or do you want the program to simply exit when it receives the signal 36 from SLURM? Or...
Thanks for reporting this, @PhaethonPrime. I can reproduce this locally. It seems like this only affects python3+ and not python2.x.
> So I did this on Dekaksi and never got an error, ... Did you use it for creating a local commit or for pushing an existing commit upstream to...
> An alternative workaround would be that in this case where the JASSERT would have triggered, we can instead invoke the original code: setenv("LD_PRELOAD", userPreload, 1); from an earlier version...
@mamelara: Sorry for the delay. Twinkle and I have identified the root cause, and we have a patch that fixes the issue. Twinkle is cleaning up the patch and preparing...