MetaMorpheus Another crosslinking and database partitioning issue?

Hi, I've noticed that when you adjust the database partition size, the number of identified inter and intra-protein crosslinks, above 1% FDR, increase 5-10 fold. My protein databases are fairly small, between 100-250 proteins, depending on the search. When using a database partition size of 1, the results are lackluster. Increasing the partition size to 2 brings the results closer in line with pLink2 and XlinkX. Increasing the partition size further doesn't have a significant impact on results when compared to a partition size of 2, until you get up to higher partitions like 25-50, which then results in a drop in crosslinks identified above 1% FDR.

For example, with an unenriched crosslink file, with all other parameters held constant (parameters aren't very stringent - 20 ppm precursor and product tolerance, 2 miss cleavages allowed, 2 variable mods, etc) I see the following:

Database partition - Inter - Intra - single - loop - deadend 1 0 2 10150 72 822 2 418 48 10067 77 814 3 295 57 10094 79 819 5 229 59 10112 81 836 25 70 62 10208 107 845

With an enriched crosslink sample, the results are also very dramatic:

Database partition - Inter - Intra - single - loop - deadend 1 8 846 101 781 2612 2 1092 1300 91 803 2622

I appreciate any advise as to whether the results achieved with the increased partition size can be trusted based on q-value/score or if there is a bug that is artificially causing high scoring crosslinked peptide ids.

Jul 30 '21 01:07 emirzakh

Thanks for providing the details of the issue. We are aware of the issue (#2039). We have a new update here which will solve this problem (#2084) in theory. Still, your information is very valuable and I need to run some analysis to confirm it. I didn't expect the difference of database partition to cause such a big change of ids. Please wait for more information and the update.

Jul 30 '21 03:07 lonelu

Solved in https://github.com/smith-chem-wisc/MetaMorpheus/pull/2084?

Aug 26 '21 12:08 acesnik

Should be. But need further feedbacks.

Aug 26 '21 23:08 lonelu

The recent MM update has affected the crosslink IDs again. Prior to this update, when searched with the database partition=2, a sample set gave 2,936 inter-protein, 953 intra-protein, 444 loop, 4689 mono, and 52,548 single peptides. I reran this exact search on the updated MM and the difference is very dramatic for the inter and intra-protein crosslinks - 13 inter and 25 intra. Loop, mono, and single peptides are still within similar ranges with 324, 3810, 40,751 respectively. Changing the database partition no longer rescues the results.

Dec 06 '21 21:12 emirzakh

to clarify:

you get a different result now compared to earlier
now, when you change partitions, you get the same result each time. (I think this is the desired result, correct?)

Dec 06 '21 23:12 trishorts

Hi, yes, the results now are different than prior to the update.

Regarding question #2 - yes, the result stays the same when the partition is changed. However, this raises another question - were the prior results correct or are the current results correct or were they both correct? With the massive difference of 3,889 vs 38 inter-peptide crosslinks for just 1 experiment between the two updates, this seems like an important answer to know.

Dec 07 '21 01:12 emirzakh

The results of our test case are also different from the results obtained with the previous version. However, they are not very different like yours. I have no idea what happened. Do you mind sharing part of your data for us to analyze? It could help us to figure out a potential bug.

Dec 07 '21 04:12 lonelu

Sure. Is there an email that I can send a google drive link to?

Dec 07 '21 16:12 emirzakh

Thank you very much! Please email to '[email protected]'. Please also include the .toml files you used for the current and previous versions.

Dec 08 '21 06:12 lonelu

Hi, I just wanted to double check that you were able to access the google drive files I sent. Please let me know if there are any other files that I can share to help with this issue!

Dec 15 '21 18:12 emirzakh

MetaMorpheus MetaMorpheus copied to clipboard

Another crosslinking and database partitioning issue?

MetaMorpheus
MetaMorpheus copied to clipboard