Documentation or reference for missing models
I can map most of the model names from DeepLCModels to the supplementary table 2 in the 2021 publication. A couple gaps are filled by issue #77 (thanks!).
I cannot find any information about these models, which were added after the publication:
full_hc_PXD008783_median_calibrate
full_hc_TMTpro_train_msv000088167_median
full_hc_mod_deeplc_train_filtered
full_hc_multretra_train
full_hc_phospho_kai_li
full_hc_tmt_data_consensus_ticnum_filtered
The PRIDE project gives some clues, of course, but was the data set on MassIVE ever published?
Hi,
Yes, msv000088167 has been published: https://massive.ucsd.edu/ProteoSAFe/dataset.jsp?task=2f82c5f336a441d7a7aee378d84f7a58
With regards to the other models, these were mostly trained on internal data. I did make the models public, as they could be especially useful for TMT (full_hc_tmt_data_consensus_ticnum_filtered), phosphopeptides (full_hc_phospho_kai_li), or modifications in general (full_hc_mod_deeplc_train_filtered). I am unfortunately unable to give you a timeline on when this data is publicly available.
With regards to multreta, that was an experimental run where the model was iteratively trained on a large number of datasets. Each dataset was considered as an seperate entity and only trained on for a couple of epochs before switching to a new dataset. Although I cannot give any guarantees, it seems this model actually performs very well across a large number of datasets.
Hope that helps :),
Robbin
Yes, that's useful. I think this should be highlighted in the DeepLCModels README.md.