MetaMorpheus
MetaMorpheus copied to clipboard
Another crosslinking and database partitioning issue?
Hi, I've noticed that when you adjust the database partition size, the number of identified inter and intra-protein crosslinks, above 1% FDR, increase 5-10 fold. My protein databases are fairly small, between 100-250 proteins, depending on the search. When using a database partition size of 1, the results are lackluster. Increasing the partition size to 2 brings the results closer in line with pLink2 and XlinkX. Increasing the partition size further doesn't have a significant impact on results when compared to a partition size of 2, until you get up to higher partitions like 25-50, which then results in a drop in crosslinks identified above 1% FDR.
For example, with an unenriched crosslink file, with all other parameters held constant (parameters aren't very stringent - 20 ppm precursor and product tolerance, 2 miss cleavages allowed, 2 variable mods, etc) I see the following:
Database partition - Inter - Intra - single - loop - deadend 1 0 2 10150 72 822 2 418 48 10067 77 814 3 295 57 10094 79 819 5 229 59 10112 81 836 25 70 62 10208 107 845
With an enriched crosslink sample, the results are also very dramatic:
Database partition - Inter - Intra - single - loop - deadend 1 8 846 101 781 2612 2 1092 1300 91 803 2622
I appreciate any advise as to whether the results achieved with the increased partition size can be trusted based on q-value/score or if there is a bug that is artificially causing high scoring crosslinked peptide ids.
Thanks for providing the details of the issue. We are aware of the issue (#2039). We have a new update here which will solve this problem (#2084) in theory. Still, your information is very valuable and I need to run some analysis to confirm it. I didn't expect the difference of database partition to cause such a big change of ids. Please wait for more information and the update.
Solved in https://github.com/smith-chem-wisc/MetaMorpheus/pull/2084?
Should be. But need further feedbacks.
The recent MM update has affected the crosslink IDs again. Prior to this update, when searched with the database partition=2, a sample set gave 2,936 inter-protein, 953 intra-protein, 444 loop, 4689 mono, and 52,548 single peptides. I reran this exact search on the updated MM and the difference is very dramatic for the inter and intra-protein crosslinks - 13 inter and 25 intra. Loop, mono, and single peptides are still within similar ranges with 324, 3810, 40,751 respectively. Changing the database partition no longer rescues the results.
to clarify:
- you get a different result now compared to earlier
- now, when you change partitions, you get the same result each time. (I think this is the desired result, correct?)
Hi, yes, the results now are different than prior to the update.
Regarding question #2 - yes, the result stays the same when the partition is changed. However, this raises another question - were the prior results correct or are the current results correct or were they both correct? With the massive difference of 3,889 vs 38 inter-peptide crosslinks for just 1 experiment between the two updates, this seems like an important answer to know.
The results of our test case are also different from the results obtained with the previous version. However, they are not very different like yours. I have no idea what happened. Do you mind sharing part of your data for us to analyze? It could help us to figure out a potential bug.
Sure. Is there an email that I can send a google drive link to?
Thank you very much! Please email to '[email protected]'. Please also include the .toml files you used for the current and previous versions.
Hi, I just wanted to double check that you were able to access the google drive files I sent. Please let me know if there are any other files that I can share to help with this issue!