In some scenarios `fetch_text` returns empty string
This issue is related to schema_salad.fetcher.DefaultFetcher.fetch_text, and it is a bit difficult to reproduce. I have found it this late afternoon, when I tried next cwltool commands:
cwltool --debug --validate https://raw.githubusercontent.com/inab/Wetlab2Variations/eosc-life/cwl-workflows/workflows/workflow.cwl
cwltool --debug --print-deps https://raw.githubusercontent.com/inab/Wetlab2Variations/eosc-life/cwl-workflows/workflows/workflow.cwl
The issue is difficult to reproduce as it has appeared for me only from wireless. But I was unable to reproduce it from my workstation, for instance. When it fails the output is similar to next, but the intermediate stack trace is variable:
(.37) jmfernandez@olympo[10]:/tmp/ik> cwltool --debug --validate https://raw.githubusercontent.com/inab/Wetlab2Variations/eosc-life/cwl-workflows/workflows/workflow.cwl
INFO /tmp/ik/.37/bin/cwltool 3.1.20211020155521
ERROR I'm sorry, I couldn't load this CWL file.
The error was:
Traceback (most recent call last):
File "/tmp/ik/.37/lib/python3.7/site-packages/cwltool/main.py", line 1067, in main
tool = make_tool(uri, loadingContext)
File "/tmp/ik/.37/lib/python3.7/site-packages/cwltool/load_tool.py", line 470, in make_tool
tool = loadingContext.construct_tool_object(processobj, loadingContext)
File "/tmp/ik/.37/lib/python3.7/site-packages/cwltool/workflow.py", line 51, in default_make_tool
return Workflow(toolpath_object, loadingContext)
File "/tmp/ik/.37/lib/python3.7/site-packages/cwltool/workflow.py", line 106, in __init__
step, index, loadingContext, loadingContext.prov_obj
File "/tmp/ik/.37/lib/python3.7/site-packages/cwltool/workflow.py", line 150, in make_workflow_step
return WorkflowStep(toolpath_object, pos, loadingContext, parentworkflowProv)
File "/tmp/ik/.37/lib/python3.7/site-packages/cwltool/workflow.py", line 234, in __init__
self.embedded_tool = load_tool(toolpath_object["run"], loadingContext)
File "/tmp/ik/.37/lib/python3.7/site-packages/cwltool/load_tool.py", line 486, in load_tool
loadingContext, workflowobj, uri = fetch_document(argsworkflow, loadingContext)
File "/tmp/ik/.37/lib/python3.7/site-packages/cwltool/load_tool.py", line 133, in fetch_document
loadingContext.loader.fetch(fileuri, content_types=CWL_CONTENT_TYPES),
File "schema_salad/ref_resolver.py", line 985, in fetch
StopIteration
Digging in cwltool and later in schema_salad I found that, sometimes next block returns what it is expected, sometimes it returns an empty string:
https://github.com/common-workflow-language/schema_salad/blob/5d45ae6cdc4b9896d6e305b4372cb8aef3e42238/schema_salad/fetcher.py#L59-L75
It seems some weird behavior in requests library, where it does not materialize the full response of the request when the property is read (but I'm guessing, as I'm not a frequent user of such library). I guess that because I did a quick'n'dirty test, substituting requests library calls by urllib.request ones (see code below), and it started working:
if scheme in ["http", "https"] and self.session is not None:
try:
headers = {}
if content_types:
headers["Accept"] = ", ".join(content_types) + ", */*;q=0.8"
req = urllib.request.Request(url, headers=headers)
#resp = self.session.get(url, headers=headers)
#resp.raise_for_status()
resp = urllib.request.urlopen(req)
except Exception as e:
raise ValidationException(f"Error fetching {url}: {e}") from e
if content_types and "content-type" in resp.headers:
content_type = resp.headers["content-type"].split(";")[:1][0]
if content_type not in content_types:
_logger.warning(
f"While fetching {url}, got content-type of "
f"'{content_type}'. Expected one of {content_types}."
)
raw_content = resp.read()
resp.close()
encoding = resp.headers.get_content_charset()
return raw_content.decode('utf-8' if encoding is None else encoding)
# return resp.text
Hope this helps to anyone else having this issue!
@jmfernandez Thanks for the report, it explains some recent failures I'm seeing (also using raw.github links): https://github.com/common-workflow-language/cwltool/runs/4051426410?check_suite_focus=true#step:8:3833
Either a dependency of requests got updated, or there was a change in GitHub's infrastructure..