Toil doesn't support list of formats in CWL
Hello,
I have a wf of 2 steps: First step does unzip file. File could be fastA or fastQ. I set it as [ edam:format_1929, edam:format_1930 ] and return file has format: $(inputs.target_reads.format) Second step requires fastQ as input and calculates one statistic of input file.
I can run my wf via cwltool that installed in toil. It works fine:
DEPRECATION: Python 2.7 will reach the end of its life on January 1st, 2020.
Please upgrade your Python as the Python 2.7 version of cwltool won't be
maintained after that date.
""", category=CWLToolDeprecationWarning)
../toil-venv-3.23.1/bin/cwltool 1.0.20190906054215
../toil-venv-3.23.1/lib/python2.7/site-packages/cwltool/__init__.py:17: CWLToolDeprecationWarning:
DEPRECATION: Python 2.7 will reach the end of its life on January 1st, 2020.
Please upgrade your Python as the Python 2.7 version of cwltool won't be
maintained after that date.
""", category=CWLToolDeprecationWarning)
INFO ../toil-venv-3.23.1/bin/cwltool 1.0.20190906054215
INFO Resolved 'wf.cwl' to 'file://.../wf.cwl'
INFO [workflow ] start
INFO [workflow ] starting step unzip_reads
INFO [step unzip_reads] start
INFO [job unzip_reads] /scratch/miWQ8n$ gunzip \
-c \
/scratch/tmplRS90G/stgf9ddd6d2-663c-44d3-a7b0-4f194ecd7a7f/test.fastq.gz > /scratch/miWQ8n/unziped-file
INFO [job unzip_reads] completed success
INFO [step unzip_reads] completed success
INFO [workflow ] starting step count_submitted_reads
INFO [step count_submitted_reads] start
INFO [job count_submitted_reads] /scratch/Y3RUne$ bash \
-c \
'expr $(cat /scratch/tmpI7geLD/stg25fdb735-350e-41e1-994e-7fac2d819f97/unziped-file | wc -l) / 4' > /scratch/Y3RUne/count
INFO [job count_submitted_reads] completed success
INFO [step count_submitted_reads] completed success
INFO [workflow ] completed success
{}
INFO Final process status is success
The same wf failed if I run it with toil:
WARNING toil.leader: kind- INFO:toil:Running Toil version 3.23.1-56ad62624e7d659a7b0fbd7f1e097b880a7833e9.
WARNING toil.leader: kind-Got workflow error
WARNING toil.leader: kind-Traceback (most recent call last):
WARNING toil.leader: kind-File "../toil-venv-3.23.1/lib/python2.7/site-packages/cwltool/executors.py", line 169, in run_jobs
WARNING toil.leader: kind-for job in jobiter:
WARNING toil.leader: kind-File "../toil-venv-3.23.1/lib/python2.7/site-packages/cwltool/command_line_tool.py", line 430, in job
WARNING toil.leader: kind-builder = self._init_job(job_order, runtimeContext)
WARNING toil.leader: kind-File "../toil-venv-3.23.1/lib/python2.7/site-packages/cwltool/process.py", line 718, in _init_job
WARNING toil.leader: kind-discover_secondaryFiles=getdefault(runtime_context.toplevel, False)))
WARNING toil.leader: kind-File "../toil-venv-3.23.1/lib/python2.7/site-packages/cwltool/builder.py", line 276, in bind_input
WARNING toil.leader: kind- bindings.extend(self.bind_input(f, datum[f["name"]], lead_pos=lead_pos, tail_pos=f["name"], discover_secondaryFiles=discover_secondaryFiles))
WARNING toil.leader: kind-File ".../toil-venv-3.23.1/lib/python2.7/site-packages/cwltool/builder.py", line 349, in bind_input
WARNING toil.leader: kind-self.formatgraph)
WARNING toil.leader: kind-File "../toil-venv-3.23.1/lib/python2.7/site-packages/cwltool/builder.py", line 109, in check_format
WARNING toil.leader: kind-formatSubclassOf(afile["format"], inpf, ontology, set()):
WARNING toil.leader: kind-File "../toil-venv-3.23.1/lib/python2.7/site-packages/cwltool/builder.py", line 78, in formatSubclassOf
WARNING toil.leader: kind-for s, p, o in ontology.triples((uriRefFmt, RDFS.subClassOf, None)):
WARNING toil.leader: kind-AttributeError: 'str' object has no attribute 'triples'
WARNING toil.leader: kind-ERROR:cwltool:Got workflow error
WARNING toil.leader: kind-Traceback (most recent call last):
.......
WorkflowException: 'str' object has no attribute 'triples'
WARNING toil.leader: kind-CWLWorkflow/instanceNz7kbI ERROR:toil.worker:Exiting the worker because of a failed job on host hx-noah-70-08
WARNING toil.leader: kind-CWLWorkflow/instanceNz7kbI WARNING:toil.jobGraph:Due to failure we are reducing the remaining retry count of job 'file://...multiple-gunzip.cwl' gunzip -c kind-CWLWorkflow/instanceNz7kbI with ID kind-CWLWorkflow/instanceNz7kbI to 1
Run command:
source toil-venv-3.23.1/bin/activate
export JOB_STORE=..
export OUT_TOOL=..
rm -rf $JOB_STORE
mkdir -p $OUT_TOOL
cwltool --no-container wf.cwl wf.yml
cwltoil \
--no-container \
--batchSystem LSF \
--disableCaching \
--defaultMemory 10G \
--defaultCores 8 \
--jobStore ${JOB_STORE} \
--outdir ${OUT_TOOL} \
--retryCount 3 \
--logFile log \
--stats \
wf.cwl wf.yml
I tested toil versions 3.19, 3.21 and 3.23.1 - all of them have this issue. All necessary files atteched. formats.zip
Thank you! Kate
┆Issue is synchronized with this Jira Story ┆Issue Number: TOIL-512
@mr-c Is this feature not part of the CWL 1.0 conformance tests?
@KeteSakharova Running this for me on your provided files seems to work on python3.6 locally? I did not attempt it on --batchSystem LSF, but running it on python3.6 might fix it?
I installed from source with this:
git clone https://github.com/DataBiosphere/toil.git && cd toil && virtualenv -p python3.6 v3nv && . v3nv/bin/activate && make prepare && make develop extras=[all]
And ran with:
(v3nv) quokka@qcore:~/Desktop/formats$ toil-cwl-runner \
> --no-container \
> --disableCaching \
> --defaultMemory 10G \
> --defaultCores 8 \
> --jobStore /home/quokka/Desktop/test_jobstore \
> --outdir /home/quokka/Desktop/test_outdir \
> --logFile log \
> --stats \
> wf.cwl wf.yml
qcore 2020-02-13 22:38:12,272 MainThread INFO cwltool: Resolved 'wf.cwl' to 'file:///home/quokka/Desktop/formats/wf.cwl'
qcore 2020-02-13 22:38:24,570 MainThread WARNING rdflib.term: https://schema.org/docs/!DOCTYPE html does not look like a valid URI, trying to serialize this will break.
qcore 2020-02-13 22:38:24,570 MainThread WARNING rdflib.term: https://schema.org/docs/html lang="en" does not look like a valid URI, trying to serialize this will break.
/home/quokka/release/toil/v3nv/lib/python3.6/site-packages/rdflib/plugins/parsers/structureddata.py:30: UserWarning: html5lib not found! RDFa and Microdata parsers will not be available.
'parsers will not be available.')
Could not load extension schema https://schema.org/docs/schema_org_rdfa.html: html5lib is not installed, cannot use RDFa and Microdata parsers.
qcore 2020-02-13 22:38:24,572 MainThread WARNING salad: Could not load extension schema https://schema.org/docs/schema_org_rdfa.html: html5lib is not installed, cannot use RDFa and Microdata parsers.
qcore 2020-02-13 22:38:26,196 MainThread INFO botocore.credentials: Found credentials in shared credentials file: ~/.aws/credentials
qcore 2020-02-13 22:38:26,282 MainThread WARNING toil.batchSystems.singleMachine: Limiting maxCores to CPU count of system (8).
qcore 2020-02-13 22:38:26,282 MainThread WARNING toil.batchSystems.singleMachine: Limiting maxMemory to physically available memory (67488010240).
qcore 2020-02-13 22:38:26,283 MainThread WARNING toil.batchSystems.singleMachine: Limiting maxDisk to physically available disk (25730785280).
qcore 2020-02-13 22:38:26,340 MainThread INFO toil: Running Toil version 4.0.0a1-5932040c0542443f26fbb6b8805cc8fbe9791118.
INFO:toil.worker:Redirecting logging to /tmp/toil-962eb1a2-36bf-4377-86e7-24940bdb90d4-4a42c04d15cb49babb4ceb6b89bf64ad/tmpe3shx8_v/worker_log.txt
qcore 2020-02-13 22:38:30,363 MainThread INFO toil.leader: Finished toil run successfully.
One other note:
(v3nv) quokka@qcore:~/Desktop/formats$ cwltool --version
/home/quokka/release/toil/v3nv/bin/cwltool 2.0.20200126090152
We're currently (today in fact) transitioning toil to python3.6+ only and so we're trying to make that the new standard. Also, in line with breaking changes, we've dropped the name cwltoil, which we intended to deprecate a while back, and are now using the name toil-cwl-runner in all future releases (though this will currently only be the case if installed from source).
Hi @DailyDreaming,
Thank you for your answer! Following your advice I instaled toil with python 3.6.1. I ran again my example on LSF
toil-cwl-runner --batchSystem LSF --no-container --disableCaching --defaultMemory 10G --defaultCores 8 --jobStore test_jobstore --outdir test_outdir --logFile log --stats wf.cwl wf.yml
log:
MainThread INFO cwltool: Resolved 'wf.cwl' to 'file:///hps/nobackup2/production/metagenomics/pipeline/testing/formats/wf.cwl'
MainThread WARNING rdflib.term: https://schema.org/docs/!DOCTYPE html does not look like a valid URI, trying to serialize this will break.
MainThread WARNING rdflib.term: https://schema.org/docs/html lang="en" does not look like a valid URI, trying to serialize this will break.
/hps/nobackup/production/metagenomics/software/toil/v3nv/lib/python3.6/site-packages/rdflib/plugins/parsers/structureddata.py:30: UserWarning: html5lib not found! RDFa and Microdata parsers will not be available.
'parsers will not be available.')
Could not load extension schema https://schema.org/docs/schema_org_rdfa.html: html5lib is not installed, cannot use RDFa and Microdata parsers.
MainThread WARNING salad: Could not load extension schema https://schema.org/docs/schema_org_rdfa.html: html5lib is not installed, cannot use RDFa and Microdata parsers.
MainThread WARNING toil.batchSystems.singleMachine: Limiting maxMemory to physically available memory (270056525824).
MainThread WARNING toil.batchSystems.singleMachine: Limiting maxDisk to physically available disk (43359649792).
MainThread INFO toil: Running Toil version 4.0.0a1-993be0c3d95c83ca969d783e708c48d918407b16.
INFO:toil.worker:Redirecting logging to /scratch/node-cfc331eb-7973-4fe6-9824-cb01db67d7c2-4c851d71-4470-427d-930d-37533136b43e/tmpdg4eoeec/worker_log.txt
MainThread WARNING toil.leader: A result seems to already have been processed for job 0
Thread-644 ERROR toil.batchSystems.lsf: bjobs detected job exit code 1 for job 2182461
MainThread WARNING toil.leader: Job failed with exit value 1: 'CWLWorkflow' kind-CWLWorkflow/instance9fjoc4hj
MainThread WARNING toil.leader: Despite the batch system claiming failure the job 'CWLWorkflow' kind-CWLWorkflow/instance9fjoc4hj seems to have finished and been removed
MainThread INFO toil.leader: Finished toil run successfully.
I'm confused a bit with ERROR but toil finished successfully in this case. It is correct behaviour? Anyway, I don't see issue with formats, but now I don't understand why job failed.
Kate
@KeteSakharova Sorry for the delayed response. Could you check the log file? I believe, in this case, it is: /scratch/node-cfc331eb-7973-4fe6-9824-cb01db67d7c2-4c851d71-4470-427d-930d-37533136b43e/tmpdg4eoeec/worker_log.txt
Adding the option --debugWorker may also help. Not sure exactly why that error log message would pop up either. Does it work for you if you attempt to run it locally without LSF?
Hi @DailyDreaming,
I don't see this worker_log.txt No such file or directory. I'm checking --logFile log and has the same output as I placed above. Yes, locally pipeline is working and with cwltool also.
Log with --debugWorker is following:
MainThread INFO cwltool: Resolved 'wf.cwl' to 'file:///hps/nobackup2/production/metagenomics/pipeline/testing/kate/formats/wf.cwl'
noah-login-01.ebi.ac.uk 2020-03-23 13:46:45,925 MainThread WARNING rdflib.term: https://schema.org/docs/!DOCTYPE html does not look like a valid URI, trying to serialize this will break.
noah-login-01.ebi.ac.uk 2020-03-23 13:46:45,925 MainThread WARNING rdflib.term: https://schema.org/docs/html lang="en" does not look like a valid URI, trying to serialize this will break.
/hps/nobackup/production/metagenomics/software/toil-v3.23/v3nv/lib/python3.6/site-packages/rdflib/plugins/parsers/structureddata.py:30: UserWarning: html5lib not found! RDFa and Microdata parsers will not be available.
'parsers will not be available.')
Could not load extension schema https://schema.org/docs/schema_org_rdfa.html: html5lib is not installed, cannot use RDFa and Microdata parsers.
noah-login-01.ebi.ac.uk 2020-03-23 13:46:45,929 MainThread WARNING salad: Could not load extension schema https://schema.org/docs/schema_org_rdfa.html: html5lib is not installed, cannot use RDFa and Microdata parsers.
noah-login-01.ebi.ac.uk 2020-03-23 13:46:52,198 MainThread WARNING toil.batchSystems.singleMachine: Limiting maxMemory to physically available memory (268670373888).
noah-login-01.ebi.ac.uk 2020-03-23 13:46:52,198 MainThread WARNING toil.batchSystems.singleMachine: Limiting maxDisk to physically available disk (164755951616).
noah-login-01.ebi.ac.uk 2020-03-23 13:46:52,354 MainThread INFO toil: Running Toil version 4.0.0a1-56ad62624e7d659a7b0fbd7f1e097b880a7833e9-dirty.
noah-login-01.ebi.ac.uk 2020-03-23 13:46:52,364 MainThread INFO toil.worker: ---TOIL WORKER OUTPUT LOG---
noah-login-01.ebi.ac.uk 2020-03-23 13:46:52,364 MainThread INFO toil: Running Toil version 4.0.0a1-56ad62624e7d659a7b0fbd7f1e097b880a7833e9-dirty.
noah-login-01.ebi.ac.uk 2020-03-23 13:47:03,908 MainThread INFO cwltool: [job multiple-gunzip.cwl] /scratch/node-90fd8383-c369-49ea-88e8-09aee97f200f-abab5292-fd59-43f3-bc75-c4c95baf6a3b/tmps3mx3293/cf1a5226-c63e-4927-9b3b-5ca0bf41bc70/trydbjttb/tmp-outn774da4y$ gunzip \
-c \
/scratch/tmp753qu45k/stg2b58c327-176b-44b8-8fad-f860501c548a/test.fastq.gz > /scratch/node-90fd8383-c369-49ea-88e8-09aee97f200f-abab5292-fd59-43f3-bc75-c4c95baf6a3b/tmps3mx3293/cf1a5226-c63e-4927-9b3b-5ca0bf41bc70/trydbjttb/tmp-outn774da4y/unziped-file
noah-login-01.ebi.ac.uk 2020-03-23 13:47:03,924 MainThread INFO cwltool: [job multiple-gunzip.cwl] completed success
noah-login-01.ebi.ac.uk 2020-03-23 13:47:04,391 MainThread INFO cwltool: [job step2.cwl] /scratch/node-90fd8383-c369-49ea-88e8-09aee97f200f-abab5292-fd59-43f3-bc75-c4c95baf6a3b/tmps3mx3293/894dc58b-a491-45f2-b20c-54c488ba7c05/tinrbdgfd/tmp-out_d1dzyhe$ bash \
-c \
'expr $(cat /scratch/tmp_e99c8wo/stgcf76791b-474b-483c-b1a0-8ca3c772b82d/unziped-file | wc -l) / 4' > /scratch/node-90fd8383-c369-49ea-88e8-09aee97f200f-abab5292-fd59-43f3-bc75-c4c95baf6a3b/tmps3mx3293/894dc58b-a491-45f2-b20c-54c488ba7c05/tinrbdgfd/tmp-out_d1dzyhe/count
noah-login-01.ebi.ac.uk 2020-03-23 13:47:04,500 MainThread INFO cwltool: [job step2.cwl] completed success
noah-login-01.ebi.ac.uk 2020-03-23 13:47:04,570 MainThread INFO toil.worker: Finished running the chain of jobs on this node, we ran for a total of 12.205915 seconds
noah-login-01.ebi.ac.uk 2020-03-23 13:47:04,587 MainThread WARNING toil.leader: A result seems to already have been processed for job 0
noah-login-01.ebi.ac.uk 2020-03-23 13:58:25,346 Thread-4 ERROR toil.batchSystems.lsf: bjobs detected job exit code 1 for job 3494573
noah-login-01.ebi.ac.uk 2020-03-23 13:58:25,352 MainThread WARNING toil.leader: Job failed with exit value 1: 'CWLWorkflow' kind-CWLWorkflow/instance41t5z8yq
noah-login-01.ebi.ac.uk 2020-03-23 13:58:25,368 MainThread WARNING toil.leader: Despite the batch system claiming failure the job 'CWLWorkflow' kind-CWLWorkflow/instance41t5z8yq seems to have finished and been removed
noah-login-01.ebi.ac.uk 2020-03-23 13:58:28,652 MainThread INFO toil.leader: Finished toil run successfully.
We should see if we can replicate either of these issues still. It's not clear that the LSF runs that worked actually had a working RDF parser installed, and it's not clear that the LSF batch system was even working given the complaints about a duplicate job result.
We could probably try this locally and maybe on Slurm; we don't have LSF.