quantms icon indicating copy to clipboard operation
quantms copied to clipboard

Test the impact and how the parameter num_hits works

Open ypriverol opened this issue 1 year ago • 4 comments

Description of feature

Would be good to test for multiple datasets the impact of the parameter num_hits. The idea would be seen how this parameter will affect the identification step and the quant results.

ypriverol avatar Jan 16 '24 16:01 ypriverol

LFQ PXD001819 and TMT PXD007683 were tested using different num_hits values (1, 2 and 3).

LFQ results: When num_hits increased, the number of PSMs reported by search engines would increase. But distribution of search engines scores has no obvious change. Target PSMs and decoy PSMs are both significantly increased from Comet and MSGF. But the increasing part are most worse PEP scores. So the final results are not improved when increasing num_hits. Even performance dropped a litte.

image image image image image

TMT results: showed consistent results with the LFQ. image image image image image

daichengxin avatar Jan 22 '24 13:01 daichengxin

If you are using multiple hits, you probably want some more sophisticated consensus scoring. E.g. PEPMatrix that takes into account the similarities of the top_hits across SEs and allows some kind of reweighting based on the number of times a sequence "scaffold" was identified across multiple engines. No guarantees that it gets better though 😁

jpfeuffer avatar Jan 22 '24 15:01 jpfeuffer

Could also be used during feature linking but we do not have an algorithm for that yet. So no short-term improvements possible there.

jpfeuffer avatar Jan 22 '24 15:01 jpfeuffer

One thing that I am a bit surprised about is that it gets worse. If we are only taking the best PSM per spectrum, nothing should change by adding second-best hits. So maybe we are somewhere using more than just the best hit. If you upload a very small experiment, I can check it when I find time.

jpfeuffer avatar Jan 22 '24 15:01 jpfeuffer