MetaMorpheus icon indicating copy to clipboard operation
MetaMorpheus copied to clipboard

Another crosslinking and database partitioning issue?

Open emirzakh opened this issue 3 years ago • 10 comments

Hi, I've noticed that when you adjust the database partition size, the number of identified inter and intra-protein crosslinks, above 1% FDR, increase 5-10 fold. My protein databases are fairly small, between 100-250 proteins, depending on the search. When using a database partition size of 1, the results are lackluster. Increasing the partition size to 2 brings the results closer in line with pLink2 and XlinkX. Increasing the partition size further doesn't have a significant impact on results when compared to a partition size of 2, until you get up to higher partitions like 25-50, which then results in a drop in crosslinks identified above 1% FDR.

For example, with an unenriched crosslink file, with all other parameters held constant (parameters aren't very stringent - 20 ppm precursor and product tolerance, 2 miss cleavages allowed, 2 variable mods, etc) I see the following:

Database partition - Inter - Intra - single - loop - deadend 1 0 2 10150 72 822 2 418 48 10067 77 814 3 295 57 10094 79 819 5 229 59 10112 81 836 25 70 62 10208 107 845

With an enriched crosslink sample, the results are also very dramatic:

Database partition - Inter - Intra - single - loop - deadend 1 8 846 101 781 2612 2 1092 1300 91 803 2622

I appreciate any advise as to whether the results achieved with the increased partition size can be trusted based on q-value/score or if there is a bug that is artificially causing high scoring crosslinked peptide ids.

emirzakh avatar Jul 30 '21 01:07 emirzakh

Thanks for providing the details of the issue. We are aware of the issue (#2039). We have a new update here which will solve this problem (#2084) in theory. Still, your information is very valuable and I need to run some analysis to confirm it. I didn't expect the difference of database partition to cause such a big change of ids. Please wait for more information and the update.

lonelu avatar Jul 30 '21 03:07 lonelu

Solved in https://github.com/smith-chem-wisc/MetaMorpheus/pull/2084?

acesnik avatar Aug 26 '21 12:08 acesnik

Should be. But need further feedbacks.

lonelu avatar Aug 26 '21 23:08 lonelu

The recent MM update has affected the crosslink IDs again. Prior to this update, when searched with the database partition=2, a sample set gave 2,936 inter-protein, 953 intra-protein, 444 loop, 4689 mono, and 52,548 single peptides. I reran this exact search on the updated MM and the difference is very dramatic for the inter and intra-protein crosslinks - 13 inter and 25 intra. Loop, mono, and single peptides are still within similar ranges with 324, 3810, 40,751 respectively. Changing the database partition no longer rescues the results.

emirzakh avatar Dec 06 '21 21:12 emirzakh

to clarify:

  1. you get a different result now compared to earlier
  2. now, when you change partitions, you get the same result each time. (I think this is the desired result, correct?)

trishorts avatar Dec 06 '21 23:12 trishorts

Hi, yes, the results now are different than prior to the update.

Regarding question #2 - yes, the result stays the same when the partition is changed. However, this raises another question - were the prior results correct or are the current results correct or were they both correct? With the massive difference of 3,889 vs 38 inter-peptide crosslinks for just 1 experiment between the two updates, this seems like an important answer to know.

emirzakh avatar Dec 07 '21 01:12 emirzakh

The results of our test case are also different from the results obtained with the previous version. However, they are not very different like yours. I have no idea what happened. Do you mind sharing part of your data for us to analyze? It could help us to figure out a potential bug.

lonelu avatar Dec 07 '21 04:12 lonelu

Sure. Is there an email that I can send a google drive link to?

emirzakh avatar Dec 07 '21 16:12 emirzakh

Thank you very much! Please email to '[email protected]'. Please also include the .toml files you used for the current and previous versions.

lonelu avatar Dec 08 '21 06:12 lonelu

Hi, I just wanted to double check that you were able to access the google drive files I sent. Please let me know if there are any other files that I can share to help with this issue!

emirzakh avatar Dec 15 '21 18:12 emirzakh