reproman
reproman copied to clipboard
reproman's datalad-pair-run run record should probably store some "reproman run" construct?
ATM datalad run commit record, in the cmd
field just records the job id, e.g. "cmd": "20190920-124832-7bf3",
a full example
(git-annex)hopa:…im/reproman-master/docs/usecases/bids-fmriprep-workflow-NP/out3[master]git-annex
$> reproman run --follow --input 'data/bids/sub-{p[sub]}' -r localshell --sub condor --orc datalad-pair-run --bp "sub=02,13" bash -c 'mkdir -p out; du -scb {inputs} > out/du-sub-{p[sub]}'
2019-09-20 12:48:33,658 [INFO ] No root directory supplied for localshell; using '/home/yoh/.reproman/run-root'
2019-09-20 12:48:34,327 [INFO ] Submitting 20190920-124832-7bf3
2019-09-20 12:48:34,362 [INFO ] Submitting /home/yoh/.reproman/run-root/3d36be08-da23-11e9-85fc-8019340ce7f2/.reproman/jobs/localshell/20190920-124832-7bf3/submit
2019-09-20 12:48:34,417 [INFO ] Job 20190920-124832-7bf3 submitted as condor job 20
2019-09-20 12:48:34,426 [INFO ] Registered job 20190920-124832-7bf3
2019-09-20 12:48:34,453 [INFO ] Waiting on job 20: running
2019-09-20 12:48:44,527 [INFO ] Fetching results for 20190920-124832-7bf3
2019-09-20 12:48:44,622 [INFO ] Creating run commit in /home/yoh/proj/repronim/reproman-master/docs/usecases/bids-fmriprep-workflow-NP/out3
2019-09-20 12:48:46,446 [INFO ] Unregistered job 20190920-124832-7bf3
(dev3) 1 28852.....................................:Fri 20 Sep 2019 12:48:46 PM EDT:.
(git-annex)hopa:…im/reproman-master/docs/usecases/bids-fmriprep-workflow-NP/out3[master]git-annex
$> git show --stat
commit b91e86dabf9983c6829d4c5fa3ba3b4a126d6148 (HEAD -> master, refs/reproman/20190920-124832-7bf3)
Author: Yaroslav Halchenko <[email protected]>
Date: Fri Sep 20 12:48:46 2019 -0400
[DATALAD RUNCMD] 20190920-124832-7bf3
=== Do not change lines below ===
{
"chain": [],
"cmd": "20190920-124832-7bf3",
"dsid": "3d36be08-da23-11e9-85fc-8019340ce7f2",
"exit": 0,
"extra_inputs": [],
"inputs": [
"data/bids/sub-{p[sub]}"
],
"outputs": [],
"pwd": ".",
"reproman_jobid": "20190920-124832-7bf3"
}
^^^ Do not change lines above ^^^
.reproman/jobs/localshell/20190920-124832-7bf3/command-array | 1 +
.reproman/jobs/localshell/20190920-124832-7bf3/idmap | 1 +
.reproman/jobs/localshell/20190920-124832-7bf3/pre-finished.0 | 1 +
.reproman/jobs/localshell/20190920-124832-7bf3/pre-finished.1 | 1 +
.reproman/jobs/localshell/20190920-124832-7bf3/runscript | 115 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
.reproman/jobs/localshell/20190920-124832-7bf3/status.0 | 1 +
.reproman/jobs/localshell/20190920-124832-7bf3/status.1 | 1 +
.reproman/jobs/localshell/20190920-124832-7bf3/stderr.0 | 1 +
.reproman/jobs/localshell/20190920-124832-7bf3/stderr.1 | 1 +
.reproman/jobs/localshell/20190920-124832-7bf3/stdout.0 | 3 +++
.reproman/jobs/localshell/20190920-124832-7bf3/stdout.1 | 4 ++++
.reproman/jobs/localshell/20190920-124832-7bf3/submit | 15 ++++++++++++++
.reproman/jobs/localshell/20190920-124832-7bf3/togethome | 17 ++++++++++++++++
out/du-sub-02 | 2 ++
out/du-sub-13 | 2 ++
Should it store the command to run there instead, i.e.
sh .reproman/jobs/localshell/<JOBID>/command-array
?
Additional issue detected: in my case above command-array script seems to be missing a new line to separate separate entries:
$> nl .reproman/jobs/localshell/20190920-124832-7bf3/command-array
1 bash -c 'mkdir -p out; du -scb data/bids/sub-02 > out/du-sub-02'bash -c 'mkdir -p out; du -scb data/bids/sub-13 > out/du-sub-13'
actually there is a 0x00
there as a separator, but should be a new line.
After adjusting the cmd
entry and fixing up that command array, I managed to datalad rerun
it! whoohoo
$> datalad rerun
[INFO ] Making sure inputs are available (this may take some time)
[WARNING] Input does not exist: /home/yoh/proj/repronim/reproman-master/docs/usecases/bids-fmriprep-workflow-NP/out3/data/bids/sub-{p[sub]}
[INFO ] == Command start (output follows) =====
[INFO ] == Command exit (modification check follows) =====
action summary:
get (notneeded: 1)
save (notneeded: 5)
unlock (notneeded: 11)
so one point is that pure datalad
of cause had no clue on how to treat job parameters in the inputs, so it is not entirely rerunnable and we should think more on how to possibly make it so.
Additional issue detected: in my case above command-array script seems to be missing a new line to separate separate entries: [...] actually there is a
0x00
there as a separator, but should be a new line.
Yes, the commands are separated by NULs. Why should there be a new line?
so one point is that pure
datalad
of cause had no clue on how to treat job parameters in the inputs, so it is not entirely rerunnable and we should think more on how to possibly make it so.
This is an outstanding issue that needs to be dealt with. Quoting from #458:
reproman run records for concurrent jobs are not compatible with datalad rerun. See the run record bullet point in de60efa (NF: orchestrators: Support concurrent jobs, 2019-05-16) and
https://github.com/ReproNim/reproman/blob/7c8800e3fdedf0471584f1040f2e35025f33fe2d/reproman/support/jobs/orchestrators.py#L978-L989