cwltool --print-deps fails with workflows having namespaced location steps
I have been testing to "print" the list of dependencies of several workflows which have some of its steps declarations outside their workflow repository. I need them to be able to capture the list of all the CWL URLs involved in a workflow.
Expected Behavior
For instance, if you test next workflow:
git clone https://github.com/Sage-Bionetworks-Challenges/data-to-model-challenge-workflow
cd data-to-model-challenge-workflow
cwltool --print-deps --relative-deps primary workflow.cwl
it provides the list of both local and remote cwl dependencies.
Actual Behavior
But, with workflows where the base location is declared as a namespace, and then the namespace is used to declare the step location in a shorter way, next operation is failing:
git clone https://github.com/pvanheus/lukasa/
cd lukasa
cwltool --print-deps --relative-deps primary protein_evidence_mapping.cwl
ERROR Tool definition failed validation:
Unsupported scheme in url: bio-cwl-tools:samtools/samtools_faidx.cwl
Workflow Code
You can see one example of workflow using namespaces to provide the "prefix" to locate the steps in next link https://github.com/pvanheus/lukasa/blob/main/protein_evidence_mapping.cwl . There are more examples available in other GitHub repos.
Full Traceback
NFO /tmp/testi/.v/bin/cwltool 3.1.20221109155812
INFO Resolved 'protein_evidence_mapping.cwl' to 'file:///tmp/testi/lukasa/protein_evidence_mapping.cwl'
ERROR Tool definition failed validation:
Unsupported scheme in url: bio-cwl-tools:samtools/samtools_faidx.cwl
Traceback (most recent call last):
File "/tmp/testi/.v/lib/python3.8/site-packages/cwltool/main.py", line 1117, in main
printdeps(
File "/tmp/testi/.v/lib/python3.8/site-packages/cwltool/main.py", line 570, in printdeps
deps = find_deps(obj, document_loader, uri, basedir=basedir, nestdirs=nestdirs)
File "/tmp/testi/.v/lib/python3.8/site-packages/cwltool/main.py", line 617, in find_deps
sfs = scandeps(
File "/tmp/testi/.v/lib/python3.8/site-packages/cwltool/process.py", line 1339, in scandeps
scandeps(
File "/tmp/testi/.v/lib/python3.8/site-packages/cwltool/process.py", line 1339, in scandeps
scandeps(
File "/tmp/testi/.v/lib/python3.8/site-packages/cwltool/process.py", line 1302, in scandeps
loadref(base, u2),
File "/tmp/testi/.v/lib/python3.8/site-packages/cwltool/main.py", line 615, in loadref
return document_loader.fetch(document_loader.fetcher.urljoin(base, uri))
File "/tmp/testi/.v/lib/python3.8/site-packages/schema_salad/ref_resolver.py", line 995, in fetch
text = self.fetch_text(url, content_types=content_types)
File "/tmp/testi/.v/lib/python3.8/site-packages/schema_salad/fetcher.py", line 108, in fetch_text
raise ValidationException(f"Unsupported scheme in url: {url}")
schema_salad.exceptions.ValidationException: Unsupported scheme in url: bio-cwl-tools:samtools/samtools_faidx.cwl
Your Environment
- cwltool version: All the tests have been with cwltool 3.1.20221109155812 in Linux.
- Environment: cwltool was installed from PyPI with pip in a virtual enviroment from Python 3.8
Huh, that error message is really odd since the --validate option doesn't find any validation errors.
(cwltool):~/Development/python/workspace/lukasa$ cwltool --validate protein_evidence_mapping.cwl
INFO /home/bdepaula/mambaforge/envs/cwltool/bin/cwltool 3.1.20221109155812
INFO Resolved 'protein_evidence_mapping.cwl' to 'file:///home/bdepaula/Development/python/workspace/lukasa/protein_evidence_mapping.cwl'
protein_evidence_mapping.cwl is valid CWL.
Oh wow, I never thought about using namespace for file references that way. That is really clever.
I believe what is happening is that the find_deps function is running on the document prior to having the full schema salad preprocessing applied (this is so that files brought in by $imports get recognized as dependencies). So the find_deps function needs to apply namespaces in the URI expansion itself.
Oh wow, I never thought about using namespace for file references that way. That is really clever.
I believe what is happening is that the
find_depsfunction is running on the document prior to having the full schema salad preprocessing applied (this is so that files brought in by $imports get recognized as dependencies). So the find_deps function needs to apply namespaces in the URI expansion itself.
Ah! In that case I think we can just switch the order, and load and validate the document before printdeps (& then find_deps) is called. I tested it and it worked, just adding a test case and then will send a PR :+1:
Thanks @tetron !