mhcquant icon indicating copy to clipboard operation
mhcquant copied to clipboard

Add cache management for `ms2rescore`

Open MajoroMask opened this issue 2 months ago • 0 comments

Description of feature

I'm running the dev branch of mhcquant under the test profile on my local server:

git clone --branch dev --single-branch [email protected]:nf-core/mhcquant.git mhcquant
nextflow run ./main.nf -profile test,docker --outdir test

And keep running into the same error:

Apr-25 13:01:55.598 [Task monitor] ERROR nextflow.processor.TaskProcessor - Error executing process > 'NFCORE_MHCQUANT:MHCQUANT:MS2RESCORE (HepG2_1)'

Caused by:
  Process `NFCORE_MHCQUANT:MHCQUANT:MS2RESCORE (HepG2_1)` terminated with an error exit status (1)

Command executed:

  ms2rescore_cli.py \
      --psm_file HepG2_1.idXML \
      --spectrum_path . \
      --output_path HepG2_1_ms2rescore.idXML \
      --processes 2 \
      --ms2_tolerance 0.02 --ms2pip_model Immuno-HCD --rescoring_engine percolator --feature_generators deeplc,ms2pip
  
  cat <<-END_VERSIONS > versions.yml
  "NFCORE_MHCQUANT:MHCQUANT:MS2RESCORE":
      MS²Rescore: $(echo $(ms2rescore --version 2>&1) | grep -oP 'MS²Rescore \(v\K[^\)]+' ))
  END_VERSIONS

Command exit status:
  1

Command output:
  (empty)

Command error:
    File "/usr/local/lib/python3.10/socket.py", line 824, in create_connection
      for res in getaddrinfo(host, port, 0, SOCK_STREAM):
    File "/usr/local/lib/python3.10/socket.py", line 955, in getaddrinfo
      for res in _socket.getaddrinfo(host, port, family, type, proto, flags):
  socket.gaierror: [Errno -3] Temporary failure in name resolution
  
  During handling of the above exception, another exception occurred:
  
  Traceback (most recent call last):
    File "/data2023/suna/proj/mhcquant/bin/ms2rescore_cli.py", line 175, in <module>
      sys.exit(main())
    File "/usr/local/lib/python3.10/site-packages/click/core.py", line 1157, in __call__
      return self.main(*args, **kwargs)
    File "/usr/local/lib/python3.10/site-packages/click/core.py", line 1078, in main
      rv = self.invoke(ctx)
    File "/usr/local/lib/python3.10/site-packages/click/core.py", line 1434, in invoke
      return ctx.invoke(self.callback, **ctx.params)
    File "/usr/local/lib/python3.10/site-packages/click/core.py", line 783, in invoke
      return __callback(*args, **kwargs)
    File "/data2023/suna/proj/mhcquant/bin/ms2rescore_cli.py", line 171, in main
      rescore_idxml(kwargs["psm_file"], kwargs["output_path"], config)
    File "/data2023/suna/proj/mhcquant/bin/ms2rescore_cli.py", line 81, in rescore_idxml
      rescore(config, psm_list)
    File "/usr/local/lib/python3.10/site-packages/ms2rescore/core.py", line 80, in rescore
      fgen.add_features(psm_list)
    File "/usr/local/lib/python3.10/site-packages/ms2rescore/feature_generators/ms2pip.py", line 190, in add_features
      ms2pip_results = correlate(
    File "/usr/local/lib/python3.10/site-packages/ms2pip/core.py", line 178, in correlate
      ms2pip_parallelized = _Parallelized(
    File "/usr/local/lib/python3.10/site-packages/ms2pip/core.py", line 383, in __init__
      validate_requested_xgb_model(
    File "/usr/local/lib/python3.10/site-packages/ms2pip/_utils/xgb_models.py", line 21, in validate_requested_xgb_model
      _download_model(model_file, xgboost_model_hashes[model_file], model_dir)
    File "/usr/local/lib/python3.10/site-packages/ms2pip/_utils/xgb_models.py", line 98, in _download_model
      urllib.request.urlretrieve(
    File "/usr/local/lib/python3.10/urllib/request.py", line 241, in urlretrieve
      with contextlib.closing(urlopen(url, data)) as fp:
    File "/usr/local/lib/python3.10/urllib/request.py", line 216, in urlopen
      return opener.open(url, data, timeout)
    File "/usr/local/lib/python3.10/urllib/request.py", line 519, in open
      response = self._open(req, data)
    File "/usr/local/lib/python3.10/urllib/request.py", line 536, in _open
      result = self._call_chain(self.handle_open, protocol, protocol +
    File "/usr/local/lib/python3.10/urllib/request.py", line 496, in _call_chain
      result = func(*args)
    File "/usr/local/lib/python3.10/urllib/request.py", line 1377, in http_open
      return self.do_open(http.client.HTTPConnection, req)
    File "/usr/local/lib/python3.10/urllib/request.py", line 1351, in do_open
      raise URLError(err)
  urllib.error.URLError: <urlopen error [Errno -3] Temporary failure in name resolution>

In my case, when pipeline module calls ms2rescore_cli.py script, the ms2rescore.rescore is trying to download a xgboost file (in this case its size is 300MB) into the docker container, which cause error when the network condition is not that good.

  • The downloading function in ms2rescore is here

My request is can we have an option for ms2rescore to use a preparable cache?

  • Kinda like what nfcore/mag do to BUSCO lineage dataset. When params.busco_db is set to a local path, the process will try to read the local dir or tar.gz archive and if success, skip downloading. Otherwise if params.busco_db is a url, the process will try to download BUSCO reference from there.
  • I'm not familiar with python, but this approach is possible as far as I know.

MajoroMask avatar Apr 26 '24 01:04 MajoroMask