reproman icon indicating copy to clipboard operation
reproman copied to clipboard

reproman's datalad-pair-run run record should probably store some "reproman run" construct?

Open yarikoptic opened this issue 5 years ago • 1 comments

ATM datalad run commit record, in the cmd field just records the job id, e.g. "cmd": "20190920-124832-7bf3",

a full example
(git-annex)hopa:…im/reproman-master/docs/usecases/bids-fmriprep-workflow-NP/out3[master]git-annex
$> reproman run --follow --input 'data/bids/sub-{p[sub]}' -r localshell --sub condor --orc datalad-pair-run --bp "sub=02,13" bash -c 'mkdir -p out; du -scb {inputs} > out/du-sub-{p[sub]}'                                                                                             
2019-09-20 12:48:33,658 [INFO   ] No root directory supplied for localshell; using '/home/yoh/.reproman/run-root' 
2019-09-20 12:48:34,327 [INFO   ] Submitting 20190920-124832-7bf3                                                                                                            
2019-09-20 12:48:34,362 [INFO   ] Submitting /home/yoh/.reproman/run-root/3d36be08-da23-11e9-85fc-8019340ce7f2/.reproman/jobs/localshell/20190920-124832-7bf3/submit 
2019-09-20 12:48:34,417 [INFO   ] Job 20190920-124832-7bf3 submitted as condor job 20 
2019-09-20 12:48:34,426 [INFO   ] Registered job 20190920-124832-7bf3 
2019-09-20 12:48:34,453 [INFO   ] Waiting on job 20: running 
2019-09-20 12:48:44,527 [INFO   ] Fetching results for 20190920-124832-7bf3 
2019-09-20 12:48:44,622 [INFO   ] Creating run commit in /home/yoh/proj/repronim/reproman-master/docs/usecases/bids-fmriprep-workflow-NP/out3 
2019-09-20 12:48:46,446 [INFO   ] Unregistered job 20190920-124832-7bf3                                                                                                      
(dev3) 1 28852.....................................:Fri 20 Sep 2019 12:48:46 PM EDT:.
(git-annex)hopa:…im/reproman-master/docs/usecases/bids-fmriprep-workflow-NP/out3[master]git-annex
$> git show --stat
commit b91e86dabf9983c6829d4c5fa3ba3b4a126d6148 (HEAD -> master, refs/reproman/20190920-124832-7bf3)
Author: Yaroslav Halchenko <[email protected]>
Date:   Fri Sep 20 12:48:46 2019 -0400

    [DATALAD RUNCMD] 20190920-124832-7bf3
    
    === Do not change lines below ===
    {
     "chain": [],
     "cmd": "20190920-124832-7bf3",
     "dsid": "3d36be08-da23-11e9-85fc-8019340ce7f2",
     "exit": 0,
     "extra_inputs": [],
     "inputs": [
      "data/bids/sub-{p[sub]}"
     ],
     "outputs": [],
     "pwd": ".",
     "reproman_jobid": "20190920-124832-7bf3"
    }
    ^^^ Do not change lines above ^^^

 .reproman/jobs/localshell/20190920-124832-7bf3/command-array  |   1 +
 .reproman/jobs/localshell/20190920-124832-7bf3/idmap          |   1 +
 .reproman/jobs/localshell/20190920-124832-7bf3/pre-finished.0 |   1 +
 .reproman/jobs/localshell/20190920-124832-7bf3/pre-finished.1 |   1 +
 .reproman/jobs/localshell/20190920-124832-7bf3/runscript      | 115 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 .reproman/jobs/localshell/20190920-124832-7bf3/status.0       |   1 +
 .reproman/jobs/localshell/20190920-124832-7bf3/status.1       |   1 +
 .reproman/jobs/localshell/20190920-124832-7bf3/stderr.0       |   1 +
 .reproman/jobs/localshell/20190920-124832-7bf3/stderr.1       |   1 +
 .reproman/jobs/localshell/20190920-124832-7bf3/stdout.0       |   3 +++
 .reproman/jobs/localshell/20190920-124832-7bf3/stdout.1       |   4 ++++
 .reproman/jobs/localshell/20190920-124832-7bf3/submit         |  15 ++++++++++++++
 .reproman/jobs/localshell/20190920-124832-7bf3/togethome      |  17 ++++++++++++++++
 out/du-sub-02                                                 |   2 ++
 out/du-sub-13                                                 |   2 ++

Should it store the command to run there instead, i.e. sh .reproman/jobs/localshell/<JOBID>/command-array?

Additional issue detected: in my case above command-array script seems to be missing a new line to separate separate entries:

$> nl .reproman/jobs/localshell/20190920-124832-7bf3/command-array
     1	bash -c 'mkdir -p out; du -scb data/bids/sub-02 > out/du-sub-02'bash -c 'mkdir -p out; du -scb data/bids/sub-13 > out/du-sub-13'

actually there is a 0x00 there as a separator, but should be a new line.

After adjusting the cmd entry and fixing up that command array, I managed to datalad rerun it! whoohoo

$> datalad rerun                                                   
[INFO   ] Making sure inputs are available (this may take some time) 
[WARNING] Input does not exist: /home/yoh/proj/repronim/reproman-master/docs/usecases/bids-fmriprep-workflow-NP/out3/data/bids/sub-{p[sub]} 
[INFO   ] == Command start (output follows) ===== 
[INFO   ] == Command exit (modification check follows) ===== 
action summary:                                                                                                                                                              
  get (notneeded: 1)
  save (notneeded: 5)
  unlock (notneeded: 11)

so one point is that pure datalad of cause had no clue on how to treat job parameters in the inputs, so it is not entirely rerunnable and we should think more on how to possibly make it so.

yarikoptic avatar Sep 20 '19 17:09 yarikoptic

Additional issue detected: in my case above command-array script seems to be missing a new line to separate separate entries: [...] actually there is a 0x00 there as a separator, but should be a new line.

Yes, the commands are separated by NULs. Why should there be a new line?

so one point is that pure datalad of cause had no clue on how to treat job parameters in the inputs, so it is not entirely rerunnable and we should think more on how to possibly make it so.

This is an outstanding issue that needs to be dealt with. Quoting from #458:

reproman run records for concurrent jobs are not compatible with datalad rerun. See the run record bullet point in de60efa (NF: orchestrators: Support concurrent jobs, 2019-05-16) and

https://github.com/ReproNim/reproman/blob/7c8800e3fdedf0471584f1040f2e35025f33fe2d/reproman/support/jobs/orchestrators.py#L978-L989

kyleam avatar Sep 23 '19 16:09 kyleam