cwltool icon indicating copy to clipboard operation
cwltool copied to clipboard

cwltool --print-deps fails with workflows having namespaced location steps

Open jmfernandez opened this issue 3 years ago • 3 comments

I have been testing to "print" the list of dependencies of several workflows which have some of its steps declarations outside their workflow repository. I need them to be able to capture the list of all the CWL URLs involved in a workflow.

Expected Behavior

For instance, if you test next workflow:

git clone https://github.com/Sage-Bionetworks-Challenges/data-to-model-challenge-workflow
cd data-to-model-challenge-workflow
cwltool --print-deps --relative-deps primary workflow.cwl

it provides the list of both local and remote cwl dependencies.

Actual Behavior

But, with workflows where the base location is declared as a namespace, and then the namespace is used to declare the step location in a shorter way, next operation is failing:

git clone https://github.com/pvanheus/lukasa/
cd lukasa
cwltool --print-deps --relative-deps primary protein_evidence_mapping.cwl
ERROR Tool definition failed validation:
Unsupported scheme in url: bio-cwl-tools:samtools/samtools_faidx.cwl

Workflow Code

You can see one example of workflow using namespaces to provide the "prefix" to locate the steps in next link https://github.com/pvanheus/lukasa/blob/main/protein_evidence_mapping.cwl . There are more examples available in other GitHub repos.

Full Traceback

NFO /tmp/testi/.v/bin/cwltool 3.1.20221109155812
INFO Resolved 'protein_evidence_mapping.cwl' to 'file:///tmp/testi/lukasa/protein_evidence_mapping.cwl'
ERROR Tool definition failed validation:
Unsupported scheme in url: bio-cwl-tools:samtools/samtools_faidx.cwl
Traceback (most recent call last):
  File "/tmp/testi/.v/lib/python3.8/site-packages/cwltool/main.py", line 1117, in main
    printdeps(
  File "/tmp/testi/.v/lib/python3.8/site-packages/cwltool/main.py", line 570, in printdeps
    deps = find_deps(obj, document_loader, uri, basedir=basedir, nestdirs=nestdirs)
  File "/tmp/testi/.v/lib/python3.8/site-packages/cwltool/main.py", line 617, in find_deps
    sfs = scandeps(
  File "/tmp/testi/.v/lib/python3.8/site-packages/cwltool/process.py", line 1339, in scandeps
    scandeps(
  File "/tmp/testi/.v/lib/python3.8/site-packages/cwltool/process.py", line 1339, in scandeps
    scandeps(
  File "/tmp/testi/.v/lib/python3.8/site-packages/cwltool/process.py", line 1302, in scandeps
    loadref(base, u2),
  File "/tmp/testi/.v/lib/python3.8/site-packages/cwltool/main.py", line 615, in loadref
    return document_loader.fetch(document_loader.fetcher.urljoin(base, uri))
  File "/tmp/testi/.v/lib/python3.8/site-packages/schema_salad/ref_resolver.py", line 995, in fetch
    text = self.fetch_text(url, content_types=content_types)
  File "/tmp/testi/.v/lib/python3.8/site-packages/schema_salad/fetcher.py", line 108, in fetch_text
    raise ValidationException(f"Unsupported scheme in url: {url}")
schema_salad.exceptions.ValidationException: Unsupported scheme in url: bio-cwl-tools:samtools/samtools_faidx.cwl

Your Environment

  • cwltool version: All the tests have been with cwltool 3.1.20221109155812 in Linux.
  • Environment: cwltool was installed from PyPI with pip in a virtual enviroment from Python 3.8

jmfernandez avatar Nov 18 '22 09:11 jmfernandez

Huh, that error message is really odd since the --validate option doesn't find any validation errors.

(cwltool):~/Development/python/workspace/lukasa$ cwltool --validate protein_evidence_mapping.cwl
INFO /home/bdepaula/mambaforge/envs/cwltool/bin/cwltool 3.1.20221109155812
INFO Resolved 'protein_evidence_mapping.cwl' to 'file:///home/bdepaula/Development/python/workspace/lukasa/protein_evidence_mapping.cwl'
protein_evidence_mapping.cwl is valid CWL.

kinow avatar Nov 18 '22 09:11 kinow

Oh wow, I never thought about using namespace for file references that way. That is really clever.

I believe what is happening is that the find_deps function is running on the document prior to having the full schema salad preprocessing applied (this is so that files brought in by $imports get recognized as dependencies). So the find_deps function needs to apply namespaces in the URI expansion itself.

tetron avatar Nov 18 '22 15:11 tetron

Oh wow, I never thought about using namespace for file references that way. That is really clever.

I believe what is happening is that the find_deps function is running on the document prior to having the full schema salad preprocessing applied (this is so that files brought in by $imports get recognized as dependencies). So the find_deps function needs to apply namespaces in the URI expansion itself.

Ah! In that case I think we can just switch the order, and load and validate the document before printdeps (& then find_deps) is called. I tested it and it worked, just adding a test case and then will send a PR :+1:

Thanks @tetron !

kinow avatar Nov 18 '22 22:11 kinow