Gene Cooperman
Gene Cooperman
And I think a lot of the problem is that if the application calls `mq_unlink()` to unlink the file before the checkpoint, then we get: > [41000] ERROR at fileconnection.cpp:832...
And I've now removed the tests for `vim` and `screen` when using `make check-parallel`. (Those tests are special cases, and ordinary sequential `make check` will still call them. Also, I...
Hi Nathan, Thanks for this bug report. Also, it was great to meet you at the conference. Best, - Gene ----- Original Message ----- From: "Nathan T. Weeks" To: "dmtcp/dmtcp"...
Thanks, Nathan. I've now put in a pull request with these changes. It should appear in the next version of DMTCP. Best, - Gene ----- Original Message ----- From: "Nathan...
@MikeDacre, one way to catch a SLURM termination signal would be to add a signal handler in your code for the signal that SLURM will use. Then if your signal...
I thought that SLURM could be configured to send the signal of your choice with the amount of advance notice of your choice. Here is one URL: http://stackoverflow.com/questions/26802177/end-batch-job-before-kill-via-walltime which also...
Rohan, I vaguely remember you telling me that there was a single, particular version of Python (or other package?) that created a problem for DMTCP, but that all other version...
Rohan (@rohgarg), Does this strace output, below, remind you of anything? Twinkle, Were you able to test Python on Cori/NERSC yet, to see if you can reproduce the issue? Best,...
Hi, Yes, you're correct. DMTCP does not support Valgrind. It's because both are doing things at a low level, and they interfere with each other. Some other tools that you...
We haven't tried here to use DMTCP with ASAN. It would be interesting to try to see what is blowing up. Can you recommend a Docker instance or some kind...