psm_utils
psm_utils copied to clipboard
fix parsing PSMs and complete protein names in XTandem
[edited after adding fix for PSM parsing]
-
As XTandem's protein names tend to be abbreviated in the protein "label" tag, change the origin to the "note" tag.
-
While XTandem saves only the highest scoring PSMs per spectrum, these can still be more than one PSM, with different peptidoforms, if the score is exact the same. This is not an extremely rare case, especially with equal peptides (think of a single AA flip in the sequence). This fix parses the identifications with same peptidoforms into one new PSM, with only the relevant proteins assigned to each PSM. Before, there were weird matches of proteins to peptides, which did not occur in the databases used by XTandem.
-
Also, it seems as the remark that only one protein per peptide/PSM is parsed is thus not true anymore.