DeepLC icon indicating copy to clipboard operation
DeepLC copied to clipboard

Documentation or reference for missing models

Open vrkosk opened this issue 1 year ago • 2 comments

I can map most of the model names from DeepLCModels to the supplementary table 2 in the 2021 publication. A couple gaps are filled by issue #77 (thanks!).

I cannot find any information about these models, which were added after the publication:

full_hc_PXD008783_median_calibrate
full_hc_TMTpro_train_msv000088167_median
full_hc_mod_deeplc_train_filtered
full_hc_multretra_train
full_hc_phospho_kai_li
full_hc_tmt_data_consensus_ticnum_filtered

The PRIDE project gives some clues, of course, but was the data set on MassIVE ever published?

vrkosk avatar Jul 30 '24 14:07 vrkosk

Hi,

Yes, msv000088167 has been published: https://massive.ucsd.edu/ProteoSAFe/dataset.jsp?task=2f82c5f336a441d7a7aee378d84f7a58

With regards to the other models, these were mostly trained on internal data. I did make the models public, as they could be especially useful for TMT (full_hc_tmt_data_consensus_ticnum_filtered), phosphopeptides (full_hc_phospho_kai_li), or modifications in general (full_hc_mod_deeplc_train_filtered). I am unfortunately unable to give you a timeline on when this data is publicly available.

With regards to multreta, that was an experimental run where the model was iteratively trained on a large number of datasets. Each dataset was considered as an seperate entity and only trained on for a couple of epochs before switching to a new dataset. Although I cannot give any guarantees, it seems this model actually performs very well across a large number of datasets.

Hope that helps :),

Robbin

RobbinBouwmeester avatar Jul 30 '24 17:07 RobbinBouwmeester

Yes, that's useful. I think this should be highlighted in the DeepLCModels README.md.

vrkosk avatar Jul 31 '24 15:07 vrkosk