ms2rescore icon indicating copy to clipboard operation
ms2rescore copied to clipboard

Uncaught exception in DeepLCFeatureGenerator if not enough peptides for calibration set

Open vrkosk opened this issue 3 months ago • 2 comments

I'm getting an uncaught exception when trying to use ms2rescore.feature_generators.ms2pip.DeepLCFeatureGenerator. The error happens when there are not enough peptides in psm_list for the calibration set.

Here's how I create the environment:

C:\python\python309\python.exe -m venv venv_309_ms2rescore
venv_309_ms2rescore\Scripts\pip3 install ms2rescore==3.0.2

I'm calling the feature generator as instructed in MS2Rescore docs:

    fgen = DeepLCFeatureGenerator(
        lower_score_is_better=True, # because we use expect value as 'score'
        spectrum_path=None, # not relevant
        processes=processes,
        deeplc_retrain=False,
        calibration_set_size=0.15,
    )

    fgen.add_features(psm_list)

When there are only a few items in psm_list, there's an uncaught exception:

2024-03-22 11:17:35,204 INFO Running DeepLC for PSMs from run (1/1): `F981141_1.tsv9ig132dw.mgf`...
Traceback (most recent call last):
  File "C:\Users\villek\githead\mascot-proj\mascot\www\bin\ML_adapters\MS2RescoreAdapter.py", line 243, in <module>
    main()
  File "C:\Users\villek\githead\mascot-proj\mascot\www\bin\ML_adapters\MS2RescoreAdapter.py", line 218, in main
    _add_DeepLC_features(
  File "C:\Users\villek\githead\mascot-proj\mascot\www\bin\ML_adapters\MS2RescoreAdapter.py", line 126, in _add_DeepLC_features
    fgen.add_features(psm_list)
  File "C:\Users\villek\tmp\venv_309_ms2rescore\lib\site-packages\ms2rescore\feature_generators\deeplc.py", line 163, in add_features
    seq_df=self._psm_list_to_deeplc_peprec(psm_list_calibration)
  File "C:\Users\villek\tmp\venv_309_ms2rescore\lib\site-packages\ms2rescore\feature_generators\deeplc.py", line 211, in _psm_list_to_deeplc_peprec
    peprec = peprec.rename(
  File "C:\Users\villek\tmp\venv_309_ms2rescore\lib\site-packages\pandas\core\frame.py", line 3813, in __getitem__
    indexer = self.columns._get_indexer_strict(key, "columns")[1]
  File "C:\Users\villek\tmp\venv_309_ms2rescore\lib\site-packages\pandas\core\indexes\base.py", line 6070, in _get_indexer_strict
    self._raise_if_missing(keyarr, indexer, axis_name)
  File "C:\Users\villek\tmp\venv_309_ms2rescore\lib\site-packages\pandas\core\indexes\base.py", line 6130, in _raise_if_missing
    raise KeyError(f"None of [{key}] are in the [{axis_name}]")
KeyError: "None of [Index(['tr', 'seq', 'modifications'], dtype='object')] are in the [columns]"

The workaround in my script is to pass calibration_set_size=1.0 when round(calibration_set_size * len(psm_list[~psm_list['is_decoy']])) == 0. Then _psm_list_to_deeplc_peprec() gets a non-empty array and all is fine. Quite likely I shouldn't even use DeepLC if there aren't enough peptide matches!

vrkosk avatar Mar 22 '24 11:03 vrkosk