"File staging conflict" error is suppressed when using a container; files get overwritten
This is the same issue as described here; https://github.com/common-workflow-language/cwltool/issues/1403
Using the same CWL files as in that Issue with cwltool, when you run Toil with a CWL that uses InitialWorkDirRequirement like this;
InitialWorkDirRequirement:
listing:
- entryname: some_dir
writable: true
entry: "$({class: 'Directory', listing: inputs.input_files})"
you get a File staging conflict error:
$ toil-cwl-runner workflow.cwl input.json
...
...
[2021-02-08T11:58:02-0500] [MainThread] [I] [toil.worker] ---TOIL WORKER OUTPUT LOG---
[2021-02-08T11:58:02-0500] [MainThread] [I] [toil] Running Toil version 5.2.0-047d0c4f2949c576c80e452a0807c5be6355c63d on host server.
[2021-02-08T11:58:02-0500] [MainThread] [I] [toil.worker] Working on job 'CWLJob' bash run.sh kind-CWLJob/instance-vjb1bunf
[2021-02-08T11:58:02-0500] [MainThread] [I] [toil.worker] Loaded body Job('CWLJob' bash run.sh kind-CWLJob/instance-vjb1bunf) from description 'CWLJob' bash run.sh kind-CWLJob/instance-vjb1bunf
[2021-02-08T11:58:02-0500] [MainThread] [W] [toil.fileStores.abstractFileStore] Failed job accessed files:
[2021-02-08T11:58:02-0500] [MainThread] [W] [toil.fileStores.abstractFileStore] Downloaded file 'files/for-job/kind-CWLJob/instance-unfjem45/file-cd83ea62575a44098a4fdb15a6dd6790/output.txt' to path '/tmp/node-efa42d50-baaa-41db-b0b9-d26bc000e945-ae225bf2-14b6-4c6a-9f5b-1dfe49b07a51/tmpgz22ltyt/67f17b0c-d724-45b0-8185-498cb56a6ced/tmpm3b65eu7.tmp'
[2021-02-08T11:58:02-0500] [MainThread] [W] [toil.fileStores.abstractFileStore] Downloaded file 'files/for-job/kind-CWLJob/instance-7gxq1o32/file-e570db2d16284f1fb9359a454f04ba2a/output.txt' to path '/tmp/node-efa42d50-baaa-41db-b0b9-d26bc000e945-ae225bf2-14b6-4c6a-9f5b-1dfe49b07a51/tmpgz22ltyt/67f17b0c-d724-45b0-8185-498cb56a6ced/tmpeef58r6u.tmp'
[2021-02-08T11:58:02-0500] [MainThread] [W] [toil.fileStores.abstractFileStore] Downloaded file 'files/for-job/kind-CWLJob/instance-unfjem45/file-cd83ea62575a44098a4fdb15a6dd6790/output.txt' to path '/tmp/node-efa42d50-baaa-41db-b0b9-d26bc000e945-ae225bf2-14b6-4c6a-9f5b-1dfe49b07a51/tmpgz22ltyt/67f17b0c-d724-45b0-8185-498cb56a6ced/tmp266g1809.tmp'
[2021-02-08T11:58:02-0500] [MainThread] [W] [toil.fileStores.abstractFileStore] Downloaded file 'files/for-job/kind-CWLJob/instance-7gxq1o32/file-e570db2d16284f1fb9359a454f04ba2a/output.txt' to path '/tmp/node-efa42d50-baaa-41db-b0b9-d26bc000e945-ae225bf2-14b6-4c6a-9f5b-1dfe49b07a51/tmpgz22ltyt/67f17b0c-d724-45b0-8185-498cb56a6ced/tmpnjzcsd2f.tmp'
Traceback (most recent call last):
File "/home/conda/lib/python3.7/site-packages/toil/worker.py", line 394, in workerScript
job._runner(jobGraph=None, jobStore=jobStore, fileStore=fileStore, defer=defer)
File "/home/conda/lib/python3.7/site-packages/toil/job.py", line 2359, in _runner
returnValues = self._run(jobGraph=None, fileStore=fileStore)
File "/home/conda/lib/python3.7/site-packages/toil/job.py", line 2280, in _run
return self.run(fileStore)
File "/home/conda/lib/python3.7/site-packages/toil/cwl/cwltoil.py", line 1222, in run
logger=cwllogger,
File "/home/conda/lib/python3.7/site-packages/cwltool/executors.py", line 150, in execute
self.run_jobs(process, job_order_object, logger, runtime_context)
File "/home/conda/lib/python3.7/site-packages/cwltool/executors.py", line 257, in run_jobs
job.run(runtime_context)
File "/home/conda/lib/python3.7/site-packages/cwltool/job.py", line 566, in run
secret_store=runtimeContext.secret_store,
File "/home/conda/lib/python3.7/site-packages/cwltool/process.py", line 287, in stage_files
% (targets[entry.target].resolved, entry.resolved, entry.target)
cwltool.errors.WorkflowException: File staging conflict, trying to stage both /tmp/node-efa42d50-baaa-41db-b0b9-d26bc000e945-ae225bf2-14b6-4c6a-9f5b-1dfe49b07a51/tmpgz22ltyt/67f17b0c-d724-45b0-8185-498cb56a6ced/tmpm3b65eu7.tmp and /tmp/node-efa42d50-baaa-41db-b0b9-d26bc000e945-ae225bf2-14b6-4c6a-9f5b-1dfe49b07a51/tmpgz22ltyt/67f17b0c-d724-45b0-8185-498cb56a6ced/tmpeef58r6u.tmp to the same target /tmp/node-efa42d50-baaa-41db-b0b9-d26bc000e945-ae225bf2-14b6-4c6a-9f5b-1dfe49b07a51/tmpgz22ltyt/67f17b0c-d724-45b0-8185-498cb56a6ced/tjxzkadyk/tmp-oute3hzotys/some_dir/output.txt
[2021-02-08T11:58:02-0500] [MainThread] [E] [toil.worker] Exiting the worker because of a failed job on host server
However if you add a container requirement;
DockerRequirement:
dockerPull: ubuntu:latest
The File staging conflict does not occur, and the files overwrite each other upon being staged in the directory
$ toil-cwl-runner --singularity workflow.cwl input.json
...
[2021-02-08T12:05:28-0500] [MainThread] [I] [toil.leader] Finished toil run successfully.
Workflow Progress 100%|█████████████████████████████████████████████████████████████████████████████████████████████████| 10/10 (0 failures) [00:50<00:00, 0.20 jobs/s]
{
"output_file": {
"location": "file:///home/_test_cwl_initialDir/output.txt",
"basename": "output.txt",
"nameroot": "output",
"nameext": ".txt",
"class": "File",
"checksum": "sha1$24c5362ecd1bfafba85f185f28b12c121d03ee12",
"size": 8
}
}[2021-02-08T12:05:28-0500] [MainThread] [I] [toil.common] Successfully deleted the job store: FileJobStore(/tmp/tmp3iv1zy24)
$ cat output.txt
Sample2
The File staging conflict error needs to be raised in this case so that you can make sure you are not silently losing files in your workflow.
┆Issue is synchronized with this Jira Task ┆Issue Number: TOIL-793
@stevekm Thank you for raising this issue. A cwltool fix should fix things in Toil. We'll try to stay apprised of developments there and update accordingly.
It looks like this problem arises when the user's CWL code constructs a CWL Directory object with a listing that is not actually acceptable as a Directory's listing, because it contains multiple entries with the same name and thus can't ever be physically realized on disk.
This doesn't happen just when you take two input File objects and try and e.g. pass them to a command line tool, and they happen to have the same basename. That works regardless of whether Singularity is in use, right?
cwltool can sometimes detect a broken Directory at the file staging step, and instead of showing you an arbitrary one of the files, fails the whole workflow. But when using Singularity, it instead just shows you an arbitrary one of the files. Toil gets basically the same behavior as cwltool has by calling into it.
I feel like really this should be detected at a different point. InitialWorkDirRequirement should have its own listing pre-checked before we attempt to stage it, and if the listing tree is self-contradictory we should refuse to continue. without trying to actually stage anything.
I'm also not sure this is a Toil fix, though.
I've tested this on Toil commit 5280227633703372ce06923f2164bcf1aad65a0d and without the DockerRequirement we get the file staging conflict. With the DockerRequirement we don't, but I also see both Sample1 and Sample2 in the output file:
{
"output_file": {
"location": "file:///Users/anovak/workspace/toil/output.txt",
"basename": "output.txt",
"nameroot": "output",
"nameext": ".txt",
"class": "File",
"checksum": "sha1$4c092e98f5dc520a853c1f2f1db4e1b14fe2955d",
"size": 16
}
}[2023-04-20T16:23:00-0400] [MainThread] [I] [toil.common] Successfully deleted the job store: FileJobStore(/var/folders/0n/4y413_9s7y70lmm3yhtt3b8m0000gq/T/tmp4oyctazw)
(venv) [anovak@swords toil]% cat output.txt
Sample1
Sample2
So it could be that CWLTool changed something here.
Do we really want to force the file staging conflict error if cwltool is working around it? Probably for portability...
It could also be that I am testing with Docker, and we run into the problem with loss of files only on Singularity.