MetaMorpheus
MetaMorpheus copied to clipboard
Unintuitive proteoform/protein output
I made a file that contained 8 PrSMs at each of the proteoform ambiguity classification levels (1, 2A, 2B, 2C, 2D, 3, 4, and 5). 8 PrSMs were reported in the PrSM output. 3 proteoforms were reported in the proteoform output. 6 proteins were reported in the ProteinGroup output.
It's a little weird to me that we only reported 3 unique proteoforms, even though we identified 8 unique proteoforms. Stranger still is the ability to identify 6 unique protein groups from only 3 unique proteoforms.
The reason for this is because the Peptide/Proteoform output requires an unambiguous full sequence and the ProteinGroup output requires an unambiguous base sequence for parsimony.
I'm not sure how pressing this issue is (or if it's even an issue), but it doesn't look like a quick fix.
I'm thinking about proteoform parsimony... Saw we identified two PrSMs: A) PROTEOFORM (with unlocalized +16 mass shift) B) PROTEOFORM(Ox)
We should output a single proteoform "PROTEOFORM(Ox)" for the two, rather than reporting both.