ms2pip icon indicating copy to clipboard operation
ms2pip copied to clipboard

Improve handling very large amounts of peptides: make merging step more efficient, implement batches

Open MarcIsak opened this issue 4 years ago • 1 comments

Hi,

I tried to run MS2PIP v.3.6.1 from the command line in Ubuntu 18.04. I supplied the config text file and the .peprec file like this when I ran the command:

ms2pip -c config_ms2pip.txt -n 12 precursors.peprec

The computations does start and it runs to the very end, but is then killed for some unknown reason. The console prints the following:

merging results.... killed

I also tried to reduce the number of CPUs used to 6, but the process was killed similarly to above.

As I cannot find any output file (I assume it should be put in the working folder as one only specifies the output format and not the location), my layman guess is that the program has a bug somewhere.

Please find attached the input files I used with ms2pip.

Best,

Marc

ms2pip_files.zip

MarcIsak avatar Apr 22 '20 13:04 MarcIsak

Hi Marc,

Thanks for sharing your input files! There is indeed a problem in MS²PIP where, while predicting a large amount (>200 000) of spectra, it takes a disproportionate amount of time (and RAM) to merge the predictions after the parallelized steps. We are planning to fix this issue in the near future by predicting the spectra in multiple batches.

In the mean time, I ran the predictions for your input file by splitting it into multiple parts. You can find the split input files and predictions, and the merged predictions, here. If you have any further questions, I'd be happy to help.

Best, Ralf

RalfG avatar Apr 24 '20 17:04 RalfG