schema_salad
schema_salad copied to clipboard
Unexpected Error due to Duplicate `Content-Type` Headers in `cwltool`
When I run the following command with the latest version of cwltool
:
$ cwltool https://sandbox.zenodo.org/record/1016630/files/trimming_and_qc.cwl --help
I encounter the error below:
While fetching https://sandbox.zenodo.org/record/1016630/files/trimming_and_qc.cwl, got content-type of 'application/octet-stream, application/octet-stream'. Expected one of ['text/plain', 'application/json', 'text/vnd.yaml', 'text/yaml', 'text/x-yaml', 'application/x-yaml', 'application/octet-stream'].
I thought this might be related to the fix I provided in the past at:
https://github.com/common-workflow-language/cwltool/pull/1622
However, upon closer inspection, I noticed that the content-type is duplicated: application/octet-stream, application/octet-stream
.
I fetched the actual headers using curl
, and observed:
$ curl -D - https://sandbox.zenodo.org/record/1016630/files/trimming_and_qc.cwl
...
Content-Type: application/octet-stream
...
Content-Type: application/octet-stream
...
It seems there are two Content-Type lines.
I suspect the code around: https://github.com/common-workflow-language/schema_salad/blob/e16612a7cf2d6cd9138aafdec20958452be3b611/schema_salad/fetcher.py#L75
might be related to this issue, but I'm not sure about the exact solution. Could you please look into this?
Thanks for the report!
It seems that repeating HTTP header fields is valid.
I would rename content_type
to received_content_types
and also add a .split(",")
to make it a list.
https://github.com/common-workflow-language/schema_salad/blob/e16612a7cf2d6cd9138aafdec20958452be3b611/schema_salad/fetcher.py#L79
Then we can check if there is no intersection between the two sets/lists (content_types.isdisjoint(received_content_types)
and throw the error as before if so.
Thank you, Mr. @mr-c . Should I create the PR? (Tazro seems to want this fix done sooner rather than later.)
@suecharo yes, that would be great. Thank you
My Environment
- OS: Ubuntu 22.04.3 LTS (Jammy Jellyfish)
- Python: 3.10.12
Steps to Reproduce
Run the following command:
$ docker run -it --rm -v "$PWD":"$PWD" -w="$PWD" quay.io/commonwl/cwltool:3.1.20220628170238 https://zenodo.org/api/files/2422dda0-1bd9-4109-aa44-53d55fd934de/download-sra.cwl --help
INFO /usr/local/bin/cwltool 3.1
While fetching https://zenodo.org/api/files/2422dda0-1bd9-4109-aa44-53d55fd934de/download-sra.cwl, got content-type of 'application/octet-stream, application/octet-stream'. Expected one of ['text/plain', 'application/json', 'text/vnd.yaml', 'text/yaml', 'text/x-yaml', 'application/x-yaml', 'application/octet-stream']
Development and Testing
Setting up the schema_salad
in a virtual environment.
# === build and install ===
$ git clone --depth 1 https://github.com/suecharo/schema_salad && cd schema_salad
$ which python3
/usr/bin/python3
$ python3 -m venv .
$ source ./bin/activate
(schema_salad) $ which python3
/home/suecharo/git/github.com/suecharo/schema_salad/bin/python3
(schema_salad) $ readlink $(which python3)
/usr/bin/python3
(schema_salad) $ which pip
/home/suecharo/git/github.com/suecharo/schema_salad/bin/pip
(schema_salad) $ readlink $(which pip)
(schema_salad) $ pip install -e .
...
Successfully installed CacheControl-0.13.1 certifi-2023.7.22 charset-normalizer-3.3.0 filelock-3.12.4 idna-3.4 importlib-resources-6.1.0 isodate-0.6.1 mistune-2.0.5 msgpack-1.0.7 mypy-extensions-1.0.0 pyparsing-3.1.1 rdflib-7.0.0 requests-2.31.0 ruamel.yaml-0.17.33 ruamel.yaml.clib-0.2.7 schema-salad-0.1.dev1258+ge16612a six-1.16.0 urllib3-2.0.6
(schema_salad) $ pip list
Package Version Editable project location
------------------- -------------------- ---------------------------------------------------
CacheControl 0.13.1
certifi 2023.7.22
charset-normalizer 3.3.0
filelock 3.12.4
idna 3.4
importlib-resources 6.1.0
isodate 0.6.1
mistune 2.0.5
msgpack 1.0.7
mypy-extensions 1.0.0
pip 22.0.2
pyparsing 3.1.1
rdflib 7.0.0
requests 2.31.0
ruamel.yaml 0.17.33
ruamel.yaml.clib 0.2.7
schema-salad 0.1.dev1258+ge16612a /home/suecharo/git/github.com/suecharo/schema_salad
setuptools 59.6.0
six 1.16.0
urllib3 2.0.6
(schema_salad) $ ls ./bin/
activate activate.fish csv2rdf normalizer pip3 python python3.10 rdfgraphisomorphism rdfs2dot schema-salad-tool
activate.csh Activate.ps1 doesitcache pip pip3.10 python3 rdf2dot rdfpipe schema-salad-doc
Installing cwltool
in a virtual environment using editable schema_salad
.
# cwl-utils
(schema_salad) $ git clone --depth 1 https://github.com/common-workflow-language/cwl-utils.git
(schema_salad) $ cd cwl-utils
(schema_salad) $ vim ./requirements.txt
# Edit schema_salad version to the editable one.
(schema_salad) $ pip install -e .
...
Successfully installed cwl-upgrader-1.2.9 cwl-utils-0.29 packaging-23.2
# cwltool
(schema_salad) $ git clone --depth 1 https://github.com/common-workflow-language/cwltool.git
(schema_salad) $ cd cwltool
(schema_salad) $ vim ./setup.py
# Edit schema_salad and cwl-utils version to the editable one.
(schema_salad) $ pip install -e .
...
Successfully installed argcomplete-3.1.2 coloredlogs-15.0.1 cwltool-3.1 humanfriendly-10.0 lxml-4.9.3 networkx-3.1 prov-1.5.1 psutil-5.9.5 pydot-1.4.2 python-dateutil-2.8.2 shellescape-3.8.1
(schema_salad) $ pip list
Package Version Editable project location
------------------- -------------------- -------------------------------------------------------------
argcomplete 3.1.2
CacheControl 0.13.1
certifi 2023.7.22
charset-normalizer 3.3.0
coloredlogs 15.0.1
cwl-upgrader 1.2.9
cwl-utils 0.29 /home/suecharo/git/github.com/suecharo/schema_salad/cwl-utils
cwltool 3.1 /home/suecharo/git/github.com/suecharo/schema_salad/cwltool
filelock 3.12.4
humanfriendly 10.0
idna 3.4
importlib-resources 6.1.0
isodate 0.6.1
lxml 4.9.3
mistune 2.0.5
msgpack 1.0.7
mypy-extensions 1.0.0
networkx 3.1
packaging 23.2
pip 22.0.2
prov 1.5.1
psutil 5.9.5
pydot 1.4.2
pyparsing 3.1.1
python-dateutil 2.8.2
rdflib 7.0.0
requests 2.31.0
ruamel.yaml 0.17.33
ruamel.yaml.clib 0.2.7
schema-salad 0.1.dev1258+ge16612a /home/suecharo/git/github.com/suecharo/schema_salad
setuptools 59.6.0
shellescape 3.8.1
six 1.16.0
urllib3 2.0.6
Before attempting to fix the issue, I ran the following command to confirm that the issue is reproducible.
(schema_salad) $ cwltool https://sandbox.zenodo.org/record/1016630/files/trimming_and_qc.cwl --help
INFO /home/suecharo/git/github.com/suecharo/schema_salad/bin/cwltool 3.1
usage: https://sandbox.zenodo.org/record/1016630/files/trimming_and_qc.cwl [-h] --fastq_1 FASTQ_1
--fastq_2 FASTQ_2
[--nthreads NTHREADS]
[job_order]
The error was not reproducible. :thinking:
This made me suspect that the error might be specific to the container: quay.io/commonwl/cwltool:3.1.20220628170238
.
And probably the container is using an older version of schema_salad
.
I added print statements in the fetcher.py
to further investigate:
try:
headers = {}
if content_types:
headers["Accept"] = ", ".join(content_types) + ", */*;q=0.8"
resp = self.session.get(url, headers=headers)
resp.raise_for_status()
except Exception as e:
raise ValidationException(f"Error fetching {url}: {e}") from e
# === added ===
print("=== resp.headers ===")
print(resp.headers)
print("=== resp.headers['content-type'] ===")
print(resp.headers["content-type"])
Then I ran the following command again:
(schema_salad) $ cwltool https://sandbox.zenodo.org/record/1016630/files/trimming_and_qc.cwl --help
INFO /home/suecharo/git/github.com/suecharo/schema_salad/bin/cwltool 3.1
=== resp.headers ===
{'Server': 'nginx', 'Date': 'Thu, 05 Oct 2023 02:33:36 GMT', 'Content-Length': '1151', 'Content-Disposition': 'attachment; filename=trimming_and_qc.cwl', 'Accept-Ranges': 'none, bytes', 'Set-Cookie': 'session=9779c6ebbc5f63d_651e2080.LCWpzVkPaLmiYEY9UkKCpqimCS8; Expires=Sun, 05-Nov-2023 02:33:36 GMT; Secure; HttpOnly; Path=/', 'OC-Checksum': 'MD5:415878c78ed8265bd7367099cf2254f7', 'Content-Security-Policy': "default-src 'none';", 'X-Content-Type-Options': 'nosniff', 'X-Download-Options': 'noopen', 'X-Permitted-Cross-Domain-Policies': 'none', 'X-Frame-Options': 'sameorigin', 'X-XSS-Protection': '1; mode=block', 'ETag': '"md5:415878c78ed8265bd7367099cf2254f7"', 'X-RateLimit-Limit': '60', 'X-RateLimit-Remaining': '59', 'X-RateLimit-Reset': '1696473276', 'Retry-After': '59', 'Strict-Transport-Security': 'max-age=0', 'Referrer-Policy': 'strict-origin-when-cross-origin'}
=== resp.headers['content-type'] ===
ERROR I'm sorry, I couldn't load this CWL file, try again with --debug for more information.
The error was: 'content-type'
The output showed that the requests
library was unable to retrieve the content-type
header. :thinking:
@mr-c , In summary, the error I encountered seems likely to be resolved by updating the cwltool container. However, upon further debugging, I noticed that the requests library isn't fetching the content-type header in such cases. Just wanted to report this to you.
@suecharo I tested your example with the latest cwltool
and schema_salad
dev branches, and I get the original error that you reported. Then I tried again in a clean virtualenv and I received the new error about the missing content-type
header!
Looking into it, I think that when we get a cached response the content-type
header is missing. Delete the ~/.cache/salad
directory and try again. This returned the original error.
https://github.com/common-workflow-language/schema_salad/pull/754 created.