toil ignores LoadListingRequirement
When using toil-cwl-runner it seems like it always uses deep_listing when files or directories are mounted for running within Docker or Singularity.
If I try to add LoadListingRequirement and set it to shallow_listing this simply gets ignored. If I use cwltool it works as expected.
┆Issue is synchronized with this Jira Story ┆Issue Number: TOIL-1649
toil-cwl-runner should be differentiating the listing values internally: https://github.com/DataBiosphere/toil/blob/8faca0f1be4c4ade77b7d34ff9f860cbce5db31b/src/toil/cwl/cwltoil.py#L3456-L3512 Though it seems like it's not working.
@mr-c Is there a test in the CWL conformance tests that tests this requirement?
@stxue1
listing_requirement_none and listing_requirement_shallow and listing_requirement_deep
However, they are all single CommandLineTools and not part of workflows. Nor do they use DockerRequirements.
Looks like we should create additional conformance tests once this bug is figured out.
@adrabent , thank you for reporting this!
Seems like there are conformance tests for LoadListingRequirement: https://github.com/common-workflow-language/cwl-v1.2/blob/15d152dbf04f149845d9348c80694a377c558346/conformance_tests.yaml#L2987-L3069
toil-cwl-runner seems to pass this. @adrabent Could you provide an example of where LoadListingRequirement is not working?
I've tried testing LoadListingRequirement on a cwl expression:
#!/usr/bin/env cwl-runner
cwlVersion: v1.2
class: CommandLineTool
requirements:
InlineJavascriptRequirement: {}
LoadListingRequirement:
loadListing: shallow_listing
inputs:
input_directory:
type: Directory
outputs:
output_file:
type: string
outputBinding:
outputEval: $(JSON.stringify(inputs.input_directory))
stdout_file:
type: stdout
stdout: output.txt
baseCommand: tree
arguments:
- $(inputs.input_directory)
With a JSON input of
{
"input_directory": {"class": "Directory", "location": "directory"}
}
And a directory in the current working directory of:
heaucques@pop-os:~/Documents/toil$ tree directory
directory
├── directory
│ └── file2.txt
└── file.txt
1 directory, 2 files
After running toil-cwl-runner shallow_listing.cwl shallow_listing.json > json.txt && jq -r .output_file json.txt | jq ., the expression's view of the directory seems to have the shallow_listing as specified in the LoadListingRequirement:
{
"class": "Directory",
"location": "toildir:eyJkaXJlY3RvcnkiOiB7ImZpbGUyLnR4dCI6ICJ0b2lsZmlsZTowOjA6ZmlsZXMvbm8tam9iL2ZpbGUtMzY2NGYyMTM5NDIwNGE5ZDk1Mjc5ZjMzNjY2MTYzNGIvZmlsZTIudHh0In0sICJmaWxlLnR4dCI6ICJ0b2lsZmlsZTowOjA6ZmlsZXMvbm8tam9iL2ZpbGUtODdiMGZhM2M1Y2Q4NGRlYjk4ZDQ4NzQwMmE2N2MyNWUvZmlsZS50eHQifQ==",
"basename": "directory",
"listing": [
{
"class": "Directory",
"location": "toildir:eyJkaXJlY3RvcnkiOiB7ImZpbGUyLnR4dCI6ICJ0b2lsZmlsZTowOjA6ZmlsZXMvbm8tam9iL2ZpbGUtMzY2NGYyMTM5NDIwNGE5ZDk1Mjc5ZjMzNjY2MTYzNGIvZmlsZTIudHh0In0sICJmaWxlLnR4dCI6ICJ0b2lsZmlsZTowOjA6ZmlsZXMvbm8tam9iL2ZpbGUtODdiMGZhM2M1Y2Q4NGRlYjk4ZDQ4NzQwMmE2N2MyNWUvZmlsZS50eHQifQ==/directory",
"basename": "directory",
"path": "/tmp/toilwf-4e400301906f5de59be0e33be86e6fa2/54d7/job/tmpwrtll37t_15n7s00/stg3762e2b2-57a2-4761-9c00-656c385acb3a/directory/directory",
"dirname": "/tmp/toilwf-4e400301906f5de59be0e33be86e6fa2/54d7/job/tmpwrtll37t_15n7s00/stg3762e2b2-57a2-4761-9c00-656c385acb3a/directory"
},
{
"class": "File",
"location": "toildir:eyJkaXJlY3RvcnkiOiB7ImZpbGUyLnR4dCI6ICJ0b2lsZmlsZTowOjA6ZmlsZXMvbm8tam9iL2ZpbGUtMzY2NGYyMTM5NDIwNGE5ZDk1Mjc5ZjMzNjY2MTYzNGIvZmlsZTIudHh0In0sICJmaWxlLnR4dCI6ICJ0b2lsZmlsZTowOjA6ZmlsZXMvbm8tam9iL2ZpbGUtODdiMGZhM2M1Y2Q4NGRlYjk4ZDQ4NzQwMmE2N2MyNWUvZmlsZS50eHQifQ==/file.txt",
"basename": "file.txt",
"size": 0,
"path": "/tmp/toilwf-4e400301906f5de59be0e33be86e6fa2/54d7/job/tmpwrtll37t_15n7s00/stg3762e2b2-57a2-4761-9c00-656c385acb3a/directory/file.txt",
"dirname": "/tmp/toilwf-4e400301906f5de59be0e33be86e6fa2/54d7/job/tmpwrtll37t_15n7s00/stg3762e2b2-57a2-4761-9c00-656c385acb3a/directory",
"nameroot": "file",
"nameext": ".txt"
}
],
"path": "/tmp/toilwf-4e400301906f5de59be0e33be86e6fa2/54d7/job/tmpwrtll37t_15n7s00/stg3762e2b2-57a2-4761-9c00-656c385acb3a/directory",
"dirname": "/tmp/toilwf-4e400301906f5de59be0e33be86e6fa2/54d7/job/tmpwrtll37t_15n7s00/stg3762e2b2-57a2-4761-9c00-656c385acb3a"
}
(venv3.12) heaucques@pop-os:~/Documents/toil$ cat output.txt
/tmp/toilwf-4e400301906f5de59be0e33be86e6fa2/54d7/job/tmpwrtll37t_15n7s00/stg3762e2b2-57a2-4761-9c00-656c385acb3a/directory
|-- directory
| `-- file2.txt
`-- file.txt
1 directory, 2 files
The tree command should be fully recursive as the binding of the directory into the container is not controlled by LoadListingRequirement.
So I'm unsure how to replicate this for now.
Dear @stxue1 and @mr-c,
I tried to reproduce the listing behaviour with a minimal example workflow as well. But interestingly I need to call the step two times to make the differences visible.
minimal_example.cwl
class: Workflow
cwlVersion: v1.2
id: minimal_example
label: minimal_example
inputs:
- id: msin
type: Directory[]
outputs:
- id: msout
outputSource:
- second_pass/msout
type: Directory[]
steps:
- id: pass
in:
- id: msin
source: msin
out:
- id: msout
run: pass.cwl
- id: second_pass
in:
- id: msin
source: pass/msout
out:
- id: msout
run: pass.cwl
pass.cwl
class: CommandLineTool
cwlVersion: v1.2
id: pass
baseCommand: echo
inputs:
- id: msin
type:
- Directory[]
outputs:
- id: msout
type:
- Directory[]
outputBinding:
outputEval: $(inputs.msin)
requirements:
- class: LoadListingRequirement
loadListing: no_listing
- class: InplaceUpdateRequirement
inplaceUpdate: true
- class: DockerRequirement
dockerPull: ubuntu:22.04
Here, I need to make use of a docker container as well as InplaceUpdateRequirement to make it visible.
I have created a directory with some subdirectories and use this as an input:
debug.json
{
"msin": [ {"class": "Directory", "location": "/home/alex/debug/builds"}]
}
Now I can call cwltool minimal_workflow.cwl debug.json and then I get:
[INFO] /usr/local/miniconda3/envs/toil/bin/cwltool 3.1.20240508115724
[INFO] Resolved 'minimal_workflow.cwl' to 'file:///home/alex/debug/minimal_workflow.cwl'
[INFO] [workflow ] start
[INFO] [workflow ] starting step pass
[INFO] [step pass] start
[INFO] [job pass] /tmp/i6rom9sl$ docker \
run \
-i \
--mount=type=bind,source=/tmp/i6rom9sl,target=/UXLgnu \
--mount=type=bind,source=/tmp/i7ye065p,target=/tmp \
--mount=type=bind,source=/home/alex/debug/builds,target=/var/lib/cwl/stgbe41ee57-608b-47a8-86b5-e048c0d62367/builds,readonly \
--workdir=/UXLgnu \
--read-only=true \
--net=none \
--user=1067:200 \
--rm \
--cidfile=/tmp/_y0zkqv1/20241009110104-450740.cid \
--env=TMPDIR=/tmp \
--env=HOME=/UXLgnu \
ubuntu:22.04 \
echo
[INFO] [job pass] completed success
[INFO] [step pass] completed success
[INFO] [workflow ] starting step second_pass
[INFO] [step second_pass] start
[INFO] [job second_pass] /tmp/tnws_zxh$ docker \
run \
-i \
--mount=type=bind,source=/tmp/tnws_zxh,target=/UXLgnu \
--mount=type=bind,source=/tmp/1o8o8v3r,target=/tmp \
--mount=type=bind,source=/home/alex/debug/builds,target=/var/lib/cwl/stg239cbc5c-6ff3-4209-b1cb-a8c20ced43f6/builds,readonly \
--workdir=/UXLgnu \
--read-only=true \
--net=none \
--user=1067:200 \
--rm \
--cidfile=/tmp/h6if9iyv/20241009110105-477120.cid \
--env=TMPDIR=/tmp \
--env=HOME=/UXLgnu \
ubuntu:22.04 \
echo
[INFO] [job second_pass] completed success
[INFO] [step second_pass] completed success
[INFO] [workflow ] completed success
[INFO] Final process status is success
Now if I change from no_listing to shallow_listing in pass.cwl it seems to ignore it in the first call, but not in the second. You see it starts mounting also subdirectories (which is not intended to happen):
[INFO] /usr/local/miniconda3/envs/toil/bin/cwltool 3.1.20240508115724
[INFO] Resolved 'minimal_workflow.cwl' to 'file:///home/alex/debug/minimal_workflow.cwl'
[INFO] [workflow ] start
[INFO] [workflow ] starting step pass
[INFO] [step pass] start
[INFO] [job pass] /tmp/o0lujins$ docker \
run \
-i \
--mount=type=bind,source=/tmp/o0lujins,target=/VfEAOV \
--mount=type=bind,source=/tmp/07rtl11n,target=/tmp \
--mount=type=bind,source=/home/alex/debug/builds,target=/var/lib/cwl/stg92bbd5cd-dfd7-46d6-a9e2-44ecdff76e0f/builds,readonly \
--workdir=/VfEAOV \
--read-only=true \
--net=none \
--user=1067:200 \
--rm \
--cidfile=/tmp/6mhf0toc/20241009110459-557386.cid \
--env=TMPDIR=/tmp \
--env=HOME=/VfEAOV \
ubuntu:22.04 \
echo
[INFO] [job pass] completed success
[INFO] [step pass] completed success
[INFO] [workflow ] starting step second_pass
[INFO] [step second_pass] start
[INFO] [job second_pass] /tmp/437vdd5t$ docker \
run \
-i \
--mount=type=bind,source=/tmp/437vdd5t,target=/VfEAOV \
--mount=type=bind,source=/tmp/81hlfjpv,target=/tmp \
--mount=type=bind,source=/home/alex/debug/builds,target=/var/lib/cwl/stg205417f7-4be6-465f-9a99-bdc3ada36b12/builds,readonly \
--mount=type=bind,source=/home/alex/debug/builds/RD,target=/var/lib/cwl/stg205417f7-4be6-465f-9a99-bdc3ada36b12/builds/RD,readonly \
--workdir=/VfEAOV \
--read-only=true \
--net=none \
--user=1067:200 \
--rm \
--cidfile=/tmp/vfobvl3n/20241009110500-586300.cid \
--env=TMPDIR=/tmp \
--env=HOME=/VfEAOV \
ubuntu:22.04 \
echo
[INFO] [job second_pass] completed success
[INFO] [step second_pass] completed success
[INFO] [workflow ] completed success
[INFO] Final process status is success
For deep_listing in pass.cwl it goes really to all subdirectories (but only in the second call of the same step) and also mounts each and every subdirectory separately:
[INFO] /usr/local/miniconda3/envs/toil/bin/cwltool 3.1.20240508115724
[INFO] Resolved 'minimal_workflow.cwl' to 'file:///home/alex/debug/minimal_workflow.cwl'
[INFO] [workflow ] start
[INFO] [workflow ] starting step pass
[INFO] [step pass] start
[INFO] [job pass] /tmp/125_lxcg$ docker \
run \
-i \
--mount=type=bind,source=/tmp/125_lxcg,target=/AfxLQQ \
--mount=type=bind,source=/tmp/9kxi9_pe,target=/tmp \
--mount=type=bind,source=/home/alex/debug/builds,target=/var/lib/cwl/stgb5eacd69-fbce-48fd-93b9-cbc8fcb59312/builds,readonly \
--workdir=/AfxLQQ \
--read-only=true \
--net=none \
--user=1067:200 \
--rm \
--cidfile=/tmp/zbdj5iz7/20241009110630-262057.cid \
--env=TMPDIR=/tmp \
--env=HOME=/AfxLQQ \
ubuntu:22.04 \
echo
[INFO] [job pass] completed success
[INFO] [step pass] completed success
[INFO] [workflow ] starting step second_pass
[INFO] [step second_pass] start
[INFO] [job second_pass] /tmp/aspholzl$ docker \
run \
-i \
--mount=type=bind,source=/tmp/aspholzl,target=/AfxLQQ \
--mount=type=bind,source=/tmp/qqsz9v_s,target=/tmp \
--mount=type=bind,source=/home/alex/debug/builds,target=/var/lib/cwl/stg2721cb57-c9d2-43cb-b63f-79fa788b4072/builds,readonly \
--mount=type=bind,source=/home/alex/debug/builds/RD,target=/var/lib/cwl/stg2721cb57-c9d2-43cb-b63f-79fa788b4072/builds/RD,readonly \
--mount=type=bind,source=/home/alex/debug/builds/RD/LINC,target=/var/lib/cwl/stg2721cb57-c9d2-43cb-b63f-79fa788b4072/builds/RD/LINC,readonly \
--mount=type=bind,source=/home/alex/debug/builds/RD/LINC/results,target=/var/lib/cwl/stg2721cb57-c9d2-43cb-b63f-79fa788b4072/builds/RD/LINC/results,readonly \
--workdir=/AfxLQQ \
--read-only=true \
--net=none \
--user=1067:200 \
--rm \
--cidfile=/tmp/beno5qnm/20241009110631-291241.cid \
--env=TMPDIR=/tmp \
--env=HOME=/AfxLQQ \
ubuntu:22.04 \
echo
[INFO] [job second_pass] completed success
[INFO] [step second_pass] completed success
[INFO] [workflow ] completed success
[INFO] Final process status is success
If I repeat this exercise with toil-cwl-runner it absolutey does not matter whether I use no_listing, shallow_listing or deep_listing; it behaves as if I would have used deep_listing when running it with cwltool:
[2024-10-09T11:10:12+0200] [MainThread] [I] [cwltool] Resolved 'minimal_workflow.cwl' to 'file:///home/alex/debug/minimal_workflow.cwl'
[2024-10-09T11:10:13+0200] [MainThread] [I] [toil.cwl.cwltoil] Importing input files...
[2024-10-09T11:10:13+0200] [MainThread] [I] [toil.cwl.cwltoil] Importing tool-associated files...
[2024-10-09T11:10:13+0200] [MainThread] [I] [toil.cwl.cwltoil] Creating root job
[2024-10-09T11:10:13+0200] [MainThread] [I] [toil.cwl.cwltoil] Starting workflow
[2024-10-09T11:10:13+0200] [MainThread] [I] [toil] Running Toil version 7.0.0-d569ea5711eb310ffd5703803f7250ebf7c19576 on host transitix.
[2024-10-09T11:10:13+0200] [MainThread] [I] [toil.worker] ---TOIL WORKER OUTPUT LOG---
[2024-10-09T11:10:13+0200] [MainThread] [I] [toil] Running Toil version 7.0.0-d569ea5711eb310ffd5703803f7250ebf7c19576 on host transitix.
[2024-10-09T11:10:13+0200] [MainThread] [I] [toil.worker] Working on job 'CWLWorkflow' minimal_example kind-CWLWorkflow/instance-rglr4hss v1
[2024-10-09T11:10:13+0200] [MainThread] [I] [toil.worker] Loaded body Job('CWLWorkflow' minimal_example kind-CWLWorkflow/instance-rglr4hss v1) from description 'CWLWorkflow' minimal_example kind-CWLWorkflow/instance-rglr4hss v1
[2024-10-09T11:10:13+0200] [MainThread] [I] [toil.worker] Completed body for 'CWLWorkflow' minimal_example kind-CWLWorkflow/instance-rglr4hss v2
[2024-10-09T11:10:13+0200] [MainThread] [I] [toil.worker] Not chaining from job 'CWLWorkflow' minimal_example kind-CWLWorkflow/instance-rglr4hss v2
[2024-10-09T11:10:13+0200] [MainThread] [I] [toil.worker] Finished running the chain of jobs on this node, we ran for a total of 0.035325 seconds
[2024-10-09T11:10:13+0200] [MainThread] [I] [toil.leader] 0 jobs are running, 0 jobs are issued and waiting to run
[2024-10-09T11:10:13+0200] [MainThread] [I] [toil.worker] ---TOIL WORKER OUTPUT LOG---
[2024-10-09T11:10:13+0200] [MainThread] [I] [toil] Running Toil version 7.0.0-d569ea5711eb310ffd5703803f7250ebf7c19576 on host transitix.
[2024-10-09T11:10:13+0200] [MainThread] [I] [toil.worker] Working on job 'CWLJob' minimal_example.pass.pass kind-CWLJob/instance-0qldyysz v1
[2024-10-09T11:10:13+0200] [MainThread] [I] [toil.worker] Loaded body Job('CWLJob' minimal_example.pass.pass kind-CWLJob/instance-0qldyysz v1) from description 'CWLJob' minimal_example.pass.pass kind-CWLJob/instance-0qldyysz v1
[2024-10-09T11:10:13+0200] [MainThread] [W] [cwltool] [job minimal_example.pass.pass] Skipping Docker software container '--memory' limit despite presence of ResourceRequirement with ramMin and/or ramMax setting. Consider running with --strict-memory-limit for increased portability assurance.
[2024-10-09T11:10:13+0200] [MainThread] [W] [cwltool] [job minimal_example.pass.pass] Skipping Docker software container '--cpus' limit despite presence of ResourceRequirement with coresMin and/or coresMax setting. Consider running with --strict-cpu-limit for increased portability assurance.
[2024-10-09T11:10:13+0200] [MainThread] [I] [cwltool] [job minimal_example.pass.pass] /tmp/tmprlpoc39q$ docker \
run \
-i \
--mount=type=bind,source=/tmp/tmprlpoc39q,target=/vDiunC \
--mount=type=bind,source=/tmp/tmp83g29jku,target=/tmp \
--mount=type=bind,source=/home/alex/debug/builds,target=/var/lib/cwl/stg975bfb8f-33b6-46c1-bc8a-e94af3a2d8d5/builds,readonly \
--workdir=/vDiunC \
--read-only=true \
--net=none \
--user=1067:200 \
--rm \
--cidfile=/tmp/tmpdgot8yyw/20241009111013-856420.cid \
--env=TMPDIR=/tmp \
--env=HOME=/vDiunC \
ubuntu:22.04 \
echo
[2024-10-09T11:10:14+0200] [MainThread] [I] [cwltool] [job minimal_example.pass.pass] completed success
[2024-10-09T11:10:14+0200] [MainThread] [I] [toil.fileStores.abstractFileStore] LOG-TO-MASTER: CWL step complete: minimal_example.pass.pass
[2024-10-09T11:10:14+0200] [MainThread] [I] [toil.worker] Completed body for 'CWLJob' minimal_example.pass.pass kind-CWLJob/instance-0qldyysz v2
[2024-10-09T11:10:14+0200] [MainThread] [I] [toil.worker] Chaining from 'CWLJob' minimal_example.pass.pass kind-CWLJob/instance-0qldyysz v2 to 'CWLJob' minimal_example.second_pass.pass kind-CWLJob/instance-15i34646 v1
[2024-10-09T11:10:14+0200] [MainThread] [I] [toil.worker] Working on job 'CWLJob' minimal_example.second_pass.pass kind-CWLJob/instance-0qldyysz v3
[2024-10-09T11:10:14+0200] [MainThread] [I] [toil.worker] Loaded body Job('CWLJob' minimal_example.second_pass.pass kind-CWLJob/instance-0qldyysz v3) from description 'CWLJob' minimal_example.second_pass.pass kind-CWLJob/instance-0qldyysz v3
[2024-10-09T11:10:14+0200] [MainThread] [W] [cwltool] [job minimal_example.second_pass.pass] Skipping Docker software container '--memory' limit despite presence of ResourceRequirement with ramMin and/or ramMax setting. Consider running with --strict-memory-limit for increased portability assurance.
[2024-10-09T11:10:14+0200] [MainThread] [W] [cwltool] [job minimal_example.second_pass.pass] Skipping Docker software container '--cpus' limit despite presence of ResourceRequirement with coresMin and/or coresMax setting. Consider running with --strict-cpu-limit for increased portability assurance.
[2024-10-09T11:10:14+0200] [MainThread] [I] [cwltool] [job minimal_example.second_pass.pass] /tmp/tmpjhmb_xso$ docker \
run \
-i \
--mount=type=bind,source=/tmp/tmpjhmb_xso,target=/vDiunC \
--mount=type=bind,source=/tmp/tmp6a70ca7k,target=/tmp \
--mount=type=bind,source=/home/alex/debug/builds,target=/var/lib/cwl/stgdd1642d5-9d4d-4b22-bc67-a22df69f7bfe/builds,readonly \
--mount=type=bind,source=/home/alex/debug/builds/RD,target=/var/lib/cwl/stgdd1642d5-9d4d-4b22-bc67-a22df69f7bfe/builds/RD,readonly \
--mount=type=bind,source=/home/alex/debug/builds/RD/LINC,target=/var/lib/cwl/stgdd1642d5-9d4d-4b22-bc67-a22df69f7bfe/builds/RD/LINC,readonly \
--mount=type=bind,source=/home/alex/debug/builds/RD/LINC/results,target=/var/lib/cwl/stgdd1642d5-9d4d-4b22-bc67-a22df69f7bfe/builds/RD/LINC/results,readonly \
--workdir=/vDiunC \
--read-only=true \
--net=none \
--user=1067:200 \
--rm \
--cidfile=/tmp/tmpxnebrbyw/20241009111014-914477.cid \
--env=TMPDIR=/tmp \
--env=HOME=/vDiunC \
ubuntu:22.04 \
echo
[2024-10-09T11:10:15+0200] [MainThread] [I] [cwltool] [job minimal_example.second_pass.pass] completed success
[2024-10-09T11:10:15+0200] [MainThread] [I] [toil.fileStores.abstractFileStore] LOG-TO-MASTER: CWL step complete: minimal_example.second_pass.pass
[2024-10-09T11:10:15+0200] [MainThread] [I] [toil.worker] Completed body for 'CWLJob' minimal_example.second_pass.pass kind-CWLJob/instance-0qldyysz v5
[2024-10-09T11:10:15+0200] [MainThread] [I] [toil.worker] Not chaining from job 'CWLJob' minimal_example.second_pass.pass kind-CWLJob/instance-0qldyysz v5
[2024-10-09T11:10:15+0200] [MainThread] [I] [toil.worker] Finished running the chain of jobs on this node, we ran for a total of 2.167389 seconds
[2024-10-09T11:10:15+0200] [MainThread] [I] [toil.leader] Issued job 'CWLJob' minimal_example.pass.pass kind-CWLJob/instance-0qldyysz v1 with job batch system ID: 2 and disk: 1.0 Gi, memory: 2.0 Gi, cores: 1, accelerators: [], preemptible: False
[2024-10-09T11:10:15+0200] [MainThread] [I] [toil.worker] ---TOIL WORKER OUTPUT LOG---
[2024-10-09T11:10:15+0200] [MainThread] [I] [toil] Running Toil version 7.0.0-d569ea5711eb310ffd5703803f7250ebf7c19576 on host transitix.
[2024-10-09T11:10:15+0200] [MainThread] [I] [toil.worker] Working on job 'ResolveIndirect' minimal_example._resolve kind-ResolveIndirect/instance-a_q701n9 v1
[2024-10-09T11:10:15+0200] [MainThread] [I] [toil.worker] Loaded body Job('ResolveIndirect' minimal_example._resolve kind-ResolveIndirect/instance-a_q701n9 v1) from description 'ResolveIndirect' minimal_example._resolve kind-ResolveIndirect/instance-a_q701n9 v1
[2024-10-09T11:10:15+0200] [MainThread] [I] [toil.worker] Completed body for 'ResolveIndirect' minimal_example._resolve kind-ResolveIndirect/instance-a_q701n9 v3
[2024-10-09T11:10:15+0200] [MainThread] [I] [toil.worker] Not chaining from job 'ResolveIndirect' minimal_example._resolve kind-ResolveIndirect/instance-a_q701n9 v3
[2024-10-09T11:10:15+0200] [MainThread] [I] [toil.worker] Finished running the chain of jobs on this node, we ran for a total of 0.009180 seconds
[2024-10-09T11:10:16+0200] [Thread-2] [I] [toil.statsAndLogging] Got message from job at time 10-09-2024 11:10:16: CWL step complete: minimal_example.pass.pass
[2024-10-09T11:10:16+0200] [Thread-2] [I] [toil.statsAndLogging] Got message from job at time 10-09-2024 11:10:16: CWL step complete: minimal_example.second_pass.pass
[2024-10-09T11:10:16+0200] [Thread-2] [I] [toil.statsAndLogging] minimal_example.pass.pass.stdout follows:
=========>
<=========
[2024-10-09T11:10:16+0200] [Thread-2] [I] [toil.statsAndLogging] minimal_example.second_pass.pass.stdout follows:
=========>
<=========
[2024-10-09T11:10:19+0200] [MainThread] [I] [toil.leader] Finished toil run successfully.
[2024-10-09T11:10:19+0200] [MainThread] [I] [toil.cwl.cwltoil] Collecting workflow outputs...
[2024-10-09T11:10:19+0200] [MainThread] [I] [toil.cwl.cwltoil] Stored workflow outputs
[2024-10-09T11:10:19+0200] [MainThread] [I] [toil.cwl.cwltoil] Computing output file checksums...
[2024-10-09T11:10:19+0200] [MainThread] [I] [toil.cwl.cwltoil] CWL run complete!
[2024-10-09T11:10:19+0200] [MainThread] [I] [toil.common] Successfully deleted the job store: FileJobStore(/tmp/tmpbr31xux0)
I would expect two things:
-
cwltoolshould not mount according to the selected listing (like it does in the first call, but not in the second) -
toil-cwl-runnershould react ascwltoolin that respect, i.e. only mount the parent directory
It does look like cwltool is doing the wrong thing.
Though no matter what the LoadListingRequirement is, the docker mount should just be the top level directory. For this workflow at least, there is no reason to have any more than one mount of the TLD.
@mr-c
Is there a reason why cwltool has this behavior? This issue only occurs on the Toil side when --bypass-file-store is passed, and Toil calls into cwltool code. We're not sure what it is about the cwltool/Toil pathmapper setup that results in this behavior.
➤ Adam Novak commented:
Our options for moving forward with this might be writing a conformance test for the CWL test suite to make sure the right stuff is exposed to expressions, or digging into PathMapper to see why the cwltool one we use when bypassing the file store is making all these mounts. (I’m not sure if a bunch of superfluous mounts is actually non-conformant though.)
I think if we want to fix this we need to turn it into a PR to add a failing conformance test that should pass to the CWL conformance tests.
Any updates on that issue?
We are still not able to run a significant fraction of workflows if using toil. (they all work with cwltool). But in certain environments toil would be the preferred choice.
- Alex
I tried to reproduce the listing behaviour with a minimal example workflow as well. But interestingly I need to call the step two times to make the differences visible.
minimal_example.cwl...
This issue is unrelated to LoadListingRequirement (I think this entire issue is not caused by it either). Running cwltool and toil-cwl-runner on pass.cwl with debug.json both give a listing in the output. I believe that is because LoadListingRequirement only controls listing in expressions. Changing the output in the CWL to:
outputEval: $(JSON.stringify(inputs.msin[0]))
...
class: InlineJavascriptRequirement
shows the listing being controlled by the requirement.
The issue seems to be related to how we are populating the listing upon running the job (at least sometime after Job.run() is called). This listing gets put into the output of the first pass, and since we are passing the input of the first pass into the input of the second pass (I believe this behavior is correct as per the spec for populating the listing at runtime), the second pass' input is a little bit different due to the extra listings.
When passed into the docker mount step, something tries to make all of the listing be mounted (I assume it's due to the pathmapper setup when we pass back into cwltool?)
What we should probably be doing is looking at these listings and remove redundant-mountable listings, which should fix the docker mount problems.
I think I just need to be careful when some of the listings are removed but not all (since I'm not sure what Toil code controls populating the listing).