ms2rescore
ms2rescore copied to clipboard
Uncaught exception in DeepLCFeatureGenerator if not enough peptides for calibration set
I'm getting an uncaught exception when trying to use ms2rescore.feature_generators.ms2pip.DeepLCFeatureGenerator. The error happens when there are not enough peptides in psm_list for the calibration set.
Here's how I create the environment:
C:\python\python309\python.exe -m venv venv_309_ms2rescore
venv_309_ms2rescore\Scripts\pip3 install ms2rescore==3.0.2
I'm calling the feature generator as instructed in MS2Rescore docs:
fgen = DeepLCFeatureGenerator(
lower_score_is_better=True, # because we use expect value as 'score'
spectrum_path=None, # not relevant
processes=processes,
deeplc_retrain=False,
calibration_set_size=0.15,
)
fgen.add_features(psm_list)
When there are only a few items in psm_list, there's an uncaught exception:
2024-03-22 11:17:35,204 INFO Running DeepLC for PSMs from run (1/1): `F981141_1.tsv9ig132dw.mgf`...
Traceback (most recent call last):
File "C:\Users\villek\githead\mascot-proj\mascot\www\bin\ML_adapters\MS2RescoreAdapter.py", line 243, in <module>
main()
File "C:\Users\villek\githead\mascot-proj\mascot\www\bin\ML_adapters\MS2RescoreAdapter.py", line 218, in main
_add_DeepLC_features(
File "C:\Users\villek\githead\mascot-proj\mascot\www\bin\ML_adapters\MS2RescoreAdapter.py", line 126, in _add_DeepLC_features
fgen.add_features(psm_list)
File "C:\Users\villek\tmp\venv_309_ms2rescore\lib\site-packages\ms2rescore\feature_generators\deeplc.py", line 163, in add_features
seq_df=self._psm_list_to_deeplc_peprec(psm_list_calibration)
File "C:\Users\villek\tmp\venv_309_ms2rescore\lib\site-packages\ms2rescore\feature_generators\deeplc.py", line 211, in _psm_list_to_deeplc_peprec
peprec = peprec.rename(
File "C:\Users\villek\tmp\venv_309_ms2rescore\lib\site-packages\pandas\core\frame.py", line 3813, in __getitem__
indexer = self.columns._get_indexer_strict(key, "columns")[1]
File "C:\Users\villek\tmp\venv_309_ms2rescore\lib\site-packages\pandas\core\indexes\base.py", line 6070, in _get_indexer_strict
self._raise_if_missing(keyarr, indexer, axis_name)
File "C:\Users\villek\tmp\venv_309_ms2rescore\lib\site-packages\pandas\core\indexes\base.py", line 6130, in _raise_if_missing
raise KeyError(f"None of [{key}] are in the [{axis_name}]")
KeyError: "None of [Index(['tr', 'seq', 'modifications'], dtype='object')] are in the [columns]"
The workaround in my script is to pass calibration_set_size=1.0
when round(calibration_set_size * len(psm_list[~psm_list['is_decoy']])) == 0
. Then _psm_list_to_deeplc_peprec() gets a non-empty array and all is fine. Quite likely I shouldn't even use DeepLC if there aren't enough peptide matches!