mzLib icon indicating copy to clipboard operation
mzLib copied to clipboard

Certain modifications are kept in features other than "modified residue"

Open leahvschaffer opened this issue 6 years ago • 18 comments

Two cases I've found (so far)

  1. ptmlist.txt file has mods with PP as "Protein core" - this isn't a case looked for
  2. A lipidation mod in the UniProt .xml is feature type "lipid moiety-binding region" instead of "modified residue"

leahvschaffer avatar Sep 10 '18 19:09 leahvschaffer

Nice find, especially on the lipidation thing!

We could just change if (FeatureType == "modified residue") to if (FeatureType == "modified residue" || FeatureType == "lipid moiety-binding region") in `ProteinXmlEntry for a quick fix. These would be labeled "modified residues" during writing, but that seems okay.

acesnik avatar Sep 10 '18 20:09 acesnik

Can "Protein core" be any amino acid in the protein, even on the termini? I can't find a definition at uniprot. Google says that protein core is simply solvent inaccessible. Don't think that necessarily precludes termini (but possibly since charged?)

trishorts avatar Oct 09 '18 12:10 trishorts

There are modifications that only occur in the protein core? That's kind of fascinating if true.

acesnik avatar Oct 09 '18 12:10 acesnik

I guess it makes sense that some of these reactive side chains would only be stable when protected in the core of the protein, like 4-thiazolecarboxylic acid and 2,3-didehydroalanine.

acesnik avatar Oct 09 '18 12:10 acesnik

if we were to use one of these mods in a gptmd scenerio, what would the rules be?

trishorts avatar Oct 09 '18 12:10 trishorts

I think we'd treat them as any other mod, right? Being in the protein core is (to me) only relevant in a secondary+ structural sense. I guess we're assuming any proteins run through our software are 1) denatured and then digested (BU), 2) intact-mass (irrelevant where the mod is), or 3) denatured and then shot on the mass spec (TD), 4) not denatured and shot on the mass spec (native). Doesn't this only apply for native MS?

rmillikin avatar Oct 09 '18 15:10 rmillikin

I did an analysis of "feature type" in human canonical uniprot.xml and got the following table: image

trishorts avatar Oct 09 '18 18:10 trishorts

lipid-binding is pretty far down the list (but fine). I guess, we need to look at these and decide what to do. I suspect there are other things we'd like to have.

trishorts avatar Oct 09 '18 18:10 trishorts

this command handy for examining large file:

grep "feature type=" xml.xml > ft.txt

dumps every line containing "feature type=" to new text file ft.txt

trishorts avatar Oct 09 '18 18:10 trishorts

Nice analysis. Thanks, Shortreed.

acesnik avatar Oct 09 '18 19:10 acesnik

We could also take "metal ion-binding site" into account.

EDIT: an example: image

acesnik avatar Oct 09 '18 19:10 acesnik

Are the "non-standard amino acid" features all selenocysteine? Are these symbols included in the sequence?

EDIT: yes, it looks like they're all selenocysteine for Homo sapiens. There's also this interesting preceding site feature in one instance: image

EDIT: yes, it looks like they're also in the sequences. image

acesnik avatar Oct 09 '18 19:10 acesnik

no idea

trishorts avatar Oct 09 '18 19:10 trishorts

Is "non-terminal residue" from circular peptides? And what the heck is the singular "non-consecutive residues" feature?

acesnik avatar Oct 09 '18 19:10 acesnik

This non-consecutive definition is pretty vague: https://www.uniprot.org/help/non_cons

acesnik avatar Oct 09 '18 19:10 acesnik

recent conversation with tal fellers makes me think that if we read a "modified residue" from uniprot.xml and we don't have a matching modification that we should provide an error message (rather than skip automatically). For e.g.: MM cannot interpret modification '(2S)-4-hydroxyleucine' in protein P12345 from human_protein_canonical.xml

trishorts avatar Oct 11 '18 14:10 trishorts

An error as opposed to a warning?

acesnik avatar Oct 11 '18 19:10 acesnik

I moved that discussion to another issue https://github.com/smith-chem-wisc/mzLib/issues/417, since it's kind of separate from this one.

acesnik avatar Oct 11 '18 19:10 acesnik