mhcquant
mhcquant copied to clipboard
Add cache management for `ms2rescore`
Description of feature
I'm running the dev
branch of mhcquant
under the test profile on my local server:
git clone --branch dev --single-branch [email protected]:nf-core/mhcquant.git mhcquant
nextflow run ./main.nf -profile test,docker --outdir test
And keep running into the same error:
Apr-25 13:01:55.598 [Task monitor] ERROR nextflow.processor.TaskProcessor - Error executing process > 'NFCORE_MHCQUANT:MHCQUANT:MS2RESCORE (HepG2_1)'
Caused by:
Process `NFCORE_MHCQUANT:MHCQUANT:MS2RESCORE (HepG2_1)` terminated with an error exit status (1)
Command executed:
ms2rescore_cli.py \
--psm_file HepG2_1.idXML \
--spectrum_path . \
--output_path HepG2_1_ms2rescore.idXML \
--processes 2 \
--ms2_tolerance 0.02 --ms2pip_model Immuno-HCD --rescoring_engine percolator --feature_generators deeplc,ms2pip
cat <<-END_VERSIONS > versions.yml
"NFCORE_MHCQUANT:MHCQUANT:MS2RESCORE":
MS²Rescore: $(echo $(ms2rescore --version 2>&1) | grep -oP 'MS²Rescore \(v\K[^\)]+' ))
END_VERSIONS
Command exit status:
1
Command output:
(empty)
Command error:
File "/usr/local/lib/python3.10/socket.py", line 824, in create_connection
for res in getaddrinfo(host, port, 0, SOCK_STREAM):
File "/usr/local/lib/python3.10/socket.py", line 955, in getaddrinfo
for res in _socket.getaddrinfo(host, port, family, type, proto, flags):
socket.gaierror: [Errno -3] Temporary failure in name resolution
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/data2023/suna/proj/mhcquant/bin/ms2rescore_cli.py", line 175, in <module>
sys.exit(main())
File "/usr/local/lib/python3.10/site-packages/click/core.py", line 1157, in __call__
return self.main(*args, **kwargs)
File "/usr/local/lib/python3.10/site-packages/click/core.py", line 1078, in main
rv = self.invoke(ctx)
File "/usr/local/lib/python3.10/site-packages/click/core.py", line 1434, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/usr/local/lib/python3.10/site-packages/click/core.py", line 783, in invoke
return __callback(*args, **kwargs)
File "/data2023/suna/proj/mhcquant/bin/ms2rescore_cli.py", line 171, in main
rescore_idxml(kwargs["psm_file"], kwargs["output_path"], config)
File "/data2023/suna/proj/mhcquant/bin/ms2rescore_cli.py", line 81, in rescore_idxml
rescore(config, psm_list)
File "/usr/local/lib/python3.10/site-packages/ms2rescore/core.py", line 80, in rescore
fgen.add_features(psm_list)
File "/usr/local/lib/python3.10/site-packages/ms2rescore/feature_generators/ms2pip.py", line 190, in add_features
ms2pip_results = correlate(
File "/usr/local/lib/python3.10/site-packages/ms2pip/core.py", line 178, in correlate
ms2pip_parallelized = _Parallelized(
File "/usr/local/lib/python3.10/site-packages/ms2pip/core.py", line 383, in __init__
validate_requested_xgb_model(
File "/usr/local/lib/python3.10/site-packages/ms2pip/_utils/xgb_models.py", line 21, in validate_requested_xgb_model
_download_model(model_file, xgboost_model_hashes[model_file], model_dir)
File "/usr/local/lib/python3.10/site-packages/ms2pip/_utils/xgb_models.py", line 98, in _download_model
urllib.request.urlretrieve(
File "/usr/local/lib/python3.10/urllib/request.py", line 241, in urlretrieve
with contextlib.closing(urlopen(url, data)) as fp:
File "/usr/local/lib/python3.10/urllib/request.py", line 216, in urlopen
return opener.open(url, data, timeout)
File "/usr/local/lib/python3.10/urllib/request.py", line 519, in open
response = self._open(req, data)
File "/usr/local/lib/python3.10/urllib/request.py", line 536, in _open
result = self._call_chain(self.handle_open, protocol, protocol +
File "/usr/local/lib/python3.10/urllib/request.py", line 496, in _call_chain
result = func(*args)
File "/usr/local/lib/python3.10/urllib/request.py", line 1377, in http_open
return self.do_open(http.client.HTTPConnection, req)
File "/usr/local/lib/python3.10/urllib/request.py", line 1351, in do_open
raise URLError(err)
urllib.error.URLError: <urlopen error [Errno -3] Temporary failure in name resolution>
In my case, when pipeline module calls ms2rescore_cli.py
script, the ms2rescore.rescore
is trying to download a xgboost file (in this case its size is 300MB) into the docker container, which cause error when the network condition is not that good.
- The downloading function in
ms2rescore
is here
My request is can we have an option for ms2rescore
to use a preparable cache?
- Kinda like what
nfcore/mag
do to BUSCO lineage dataset. Whenparams.busco_db
is set to a local path, the process will try to read the local dir or tar.gz archive and if success, skip downloading. Otherwise ifparams.busco_db
is a url, the process will try to download BUSCO reference from there. - I'm not familiar with python, but this approach is possible as far as I know.
- I think add the
model_dir
argument in here in ms2rescore/core.py is a must-done. Update nextflow.config accordingly on our side is easy, but can we ask the manager of compomics/ms2rescore accept this request?
- I think add the