pyopenms-docs icon indicating copy to clipboard operation
pyopenms-docs copied to clipboard

Potential useful information to review and add to the readthedocs page - from learning unit 10

Open timosachsenberg opened this issue 1 year ago • 0 comments

In-vitro and in-vivo modifications Motivation

Cells need to rapidly respond to external stimuli and pertubations. Changing the expression levels of a protein, e.g. by regulating transcription, translation or protein degradation is often to slow. In contrast, chemically modifying proteins rendering the whole protein or just certain active sites (i.e. a protein function) activated or deactivated can be rapidly carried out by enzymes. The ability to rapidly react on changes is used in many important cellular mechanisms to tightly control the interaction of many proteins. For instance, platelets don't contain DNA but need to quickly react for proper blood clotting. This process is regulated by modifications. Also contributing to the ability of rapidly changing the activation state of many cellular functionalities is the usage of enzyme cascades. As in a snowball scheme, one activated enzyme, say a extracellular receptor extending into the cell's interior with an active site for protein modification, can trigger the (de)activation of certain enzymes, which will (de)activate others until the final functional change is achieved. An example of active research would be the MAPK/ERK pathway featuring several kinases to control cell growth. These cascades also enable the cell to simultaniously activate or deactivate multiple functions upon a single incoming signal. Though, this adds another layer of complexity. Not surprisingly, cancerous cells often exhibit a misregulation of protein activity by protein modifications.

MS-based proteomics allows to identify and quantify the modifications of proteins. It therefor allows to reveal information that would remain hidden if only high-throughput techniques, like genome or transcriptome sequencing were employed.

Different types of chemical modifications can be regularly observed in MS experiments. Some of these modifications are intrinsic to sample handling, others are introduced deliberately, e.g. for labeling. These can hence be referred tot as in-vitro modifcations. Common modifications introduced by sample handling (in-vitro): Carbamidomethylation (Cys + ~57 Da): Protection of reduced sulfide groups with Iodacetamide. Oxidation (Met + ~16 Da): Exposure to air Pyro-Glu (N-terminal Glu – ~17 Da): spontaneously Deamidation ([Asn – Gly] – ~1 Da): spontaneously Sodiation (Asp, Glu + ~22 Da) from salt Carbamylation (N- termini and Lys + ~43 Da): from metabolites of urea Common in-vivo modifications: Phosphorylation (Ser, Thr, Tyr +80) from kinase signaling. Phosphorylation is one of the most important PTMs, since most signaling pathways are propagated via phosphorylation/ dephosphorylation Acetylation (N-termini and Lys +42), co-translational, often combined with removal of protein initial Met Glycosilation (N-termini), Co- or post-translational attachment of glycans (oligosaccharides). Important for folding of some proteins and cell-cell and cell-extracellular matrix attachment Hydroxyprolination (Pro +16) stabilizes extra-cellular matrix on collagens Ubiquitination (Lys +114) marks proteins for destruction

Note that in MS based proteomics, PTM is often synonymously used for both co-translational and posttranslational modifications. Cellular signaling pathways can be very complex, overlapping and difficult to analyze. For example, in pathway I, a protein A phosphorylates protein B, and B phosphorylates C. In pathway II, a protein D phosphorylates A, or phosphorylates protein B. Untangling these relations can be very time consuming and laborious.

Example Epidermal Growth Factor (EGF): The discovery of the EGF by Stanley Cohen and Rita Levi-Montalcini led to the Nobel Prize in Physiology or Medicine in 1986. The EGF induces cellular proliferation, differentiation, and is essential for survival. On the other hand, increased activity of its receptor has been observed in various cancers. If EGF binds to the epidermal growth factor receptor on the cell surface a ligand-induced dimerization is triggered. This activates the protein-tyrosine kinase activity of the receptor which, in turn, initiates a signal transduction cascade which involves phosphorilation of various proteins. These regulate the transcription of genes that induce DNA replication and cell proliferation.

Modification classification There are many different types of protein modifications, of which many are closely related in terms of their chemical reaction. They can also be distinguished by their site occurence in an amino acid sequence. Also inherent to the matter of chemical reactions is that they can be described (named) in several different ways. To keep track of their relations and to be able to concisely report them in a machine readable (and unified) way, a database and controlled vocabulary of protein modifications can be of great help. UniMod features a comprehensive database of protein modification and is also avaliable as a controlled vocabulary.

You can browse through the database and explore different modifications (login as guest). The view function will give you a detailed description of the respective modification. It is a comprehensive database of protein modifications for mass spectrometry applications and is not placing this information in a biological context. It is community supported and curated. Important modification descriptors are, next to the chemical reaction, their: Site, which is the residue where the modification can take place. Either one of the amino acids or the N/C-terminus. Position, which is where the modification can occur. It may be position independent, at any N/C-term (e.g. a peptide terminus) or Protein N/C-term Mass, which is the mass difference added by the modification For databases placing modifications into biological context, see RESID, UniProt or Prosite database. In the next chapters, we will introduce three major groups of protein modifications.

Phoshorylation Phosphorylation is the addition of a phosphate to a certain site of amino acid. There are only three (four) amino acids with side chains that can be chemically altered in that way: serine, threonine and tyrosine. The phosphorylation of histidine is not known to occur in the animal kingdom other than in mitochondrial proteins. Protein phosphorylation results in conformational changes, uncovering or blocking certain sites in the protein. As such, it often determines the activation status for an active site of an enzyme. It plays an important role in the signal transduction pathways. Though PTM are a very common way of cellular signal transduction, a biological sample will contain only PTM proteins will present only at sub-stoichiometric levels relative to their unmodified counterparts.. Therefor enrichment techniques are applied to enable mass spectrometry identification and localization.

Enrichment of phosphorylated peptides TiO2 has proven to be particularly useful for phosphopeptide enrichment prior to LC–MS/MS analysis. TiO2 beads are packed by centrifugation in equilibrated C-18 chromatography columns. Eluates of such chromatographies will contain enriched amounts of phosphorylated peptide as to the affinity of the phosphorylated sites to TiO2.

Montoya, et al. Characterization of a TiO2 enrichment method for label-free quantitative phosphoproteomics Methods. Aug 2011; 54(4): 370–378.doi: 10.1016/j.ymeth.2011.02.004 Other classes of modifications introduce a covalent addition of further groups to the protein. These will be discussed in the following chapters.

Acetylation Protein acetylation determines structure, function and intracellular localization of proteins. It plays an important role in the signal transduction pathways. Protein acetylation in cells is regulated by a co-ordinated action of histone acetyl transferases (HAT) and histone deacetylases (HDAC). Histone deacetylation inhibits progress of many nuclear events including proliferation and damage response events. For that reason, they are prominent targets for the development of anticancer drugs and adjuvants. Enrichment of acetylated peptides Anti-acetyllysine antibodies are currently the state-of-the-art methods for the enrichment of acetyllated peptides.

Choudhary, et al. Lysine Acetylation Targets Protein Complexes and Co- Regulates. Science 325, 834 (2009); DOI: 10.1.126/science.1175371

Glycosilation Glycosylation, the attachment of sugar, is most likely the most complex of all modifications because of the structural diversity of attached glycans. Glycans (polysaccharides) represent a huge variety in their composition and often a tree like structure. Glycosylation of proteins takes mainly place in the endoplasmic reticulum (ER) and the Golgi, where enzymes like glycosyltransferases and glycosidases attach the sugar groups to the proteins. These enzymes mainly target S, T and N. Direct impact of glycosilation are charge alteration, conformational changes and changes in protein stability.

Glycosilatin and disease Abnormal protein glycosylation has been correlated with several diseases, see e.g.: Cancer (Kim et al., Implication of aberrant glycosylation in cancer and use of lectin for cancer biomarker discovery. Protein Pept Lett. 2009;16(5):499-507. PMID: 19442229) Inflammatory diseases (Brooks. Strategies for analysis of the glycosylation of proteins:current status and future perspectives. Mol Biotechnol. 2009 Sep;43(1):76-88. Epub 2009 Jun 9. PMID:19507069) In humans, glycans attached to proteins are composed of eight different monosaccharides: Mannose (Man); Glucose (Glc); Galactose (Gal); Fucose (Fuc); N-acetylgalactosamine(GalNAc); N-Acetylglucosamine (GlcNAc); N-Acetylneuraminic acid (NeuNAc); Xylose (Xyl)

These sugars can be linked in linear or branching chains of various sequences and lengths.

Sugars, such as Glc, Gal and Man have identical masses and charges. They are different stereoisomers and their combination in complex glycans can result in a broad range of different glycoforms. Glycoproteomics experiments result in very complex data sets.

The two most common forms are O- / N-linked glycosylation, with the glycan attached to S or T / the amide group of N. Enrichment of glycoproteins and glycopeptides Due to substoichiometric abundance glycoproteins/glycopeptides need to be enriched prior to LC-MS analysis Lectin-based enrichment of glycoproteins. Lectins are sugar binding proteins and most commonly used to isolate glycoproteins. Naturally, lectins play a role during virus attachment to the host cells. Here, the lectin affinity to glycosylated membrane proteins is used by the virus. Linker-based enrichment of glycoproteins. Here, linker molecules are used, that attach on one side to the glycan and on the other side to a bead. MS of glycoproteins The hydrophilic nature of glycans limits the surface activity and the ionization efficiency. Natural and basic glycoconjugates can be protonated Acidic glycoconjugates can only be deprotonated (negative ESI mode!)

Often, derivatization is used to increase hydrophobicity and volatility (and thereby ionization efficiency)

MS/MS can be performed to sequence the glycan as well as the underlying peptides but MS/MS of glycopeptides is more complicated.

Using CID, the collision energy is highly important to the content of the MS/MS spectrum. Using low energy CID, the tandem spectrum is dominated by ions from the sequential loss of sugars and occasionally the precursor ion, but there is no fragmentation of the peptide backbone. Increasing the collision energy during CID fragmentation, signals for the sugar residues diminish, but the peptide backbone ions become visible. Information in the sugar stereochemistry (which Glc, Gal or Man ?), the linkage (1->4 or 1->6 ?) or branching pattern can not be obtained using conventional CID fragmentation.

This can be achieved by “cross-ring” fragmentation on MALDI-TOF-TOF instruments. Very high collision energy is needed (orders of KeV).

Identification of modified peptides Only in recent years, technology advancements and improved mass accuracy has helped enormously in the unambigous identification of modified proteins.

While amino acids differ considerably in mass, many modifications have very similar masses. For example: Acetylation: 42.010565 Da Guanidination: 42.021798 Da Tri-methylation: 42.046950 Da

This leads to ambiguities if the precursor mass accuracy is not high enough to distinguish these modifications. For the given example and with a peptide of 800 Da mass, an instrument has to offer at least a precursor mass accuracy of <13 ppm.

The advance of higher-performing hybrid instruments (Q-TOF, LTQ-FT, and LTQ-OrbiTrap) allows <10 ppm mass accuracy and enough resolution power to analyse PTMs. Additionally, each potentially modified amino acid significantly increases the search space that must be considered. E.g. if n peptides match to the precursor mass with on average m potentially modified sites, then instead of n peptides, the search engine has to consider on average n2m petides.

This exponential increase in search space typically leads to an increase of false positive identifications.

Phosphoproteomics Phosphorylated proteins play an important role in many cellular pathways. Intracellular signal transduction is primarily mediated by the reversible phosphorylation. A good many of the mediators of such phosphorylation, kinases, are associated with diseases and are promising drug targets, like the pronteinkinaseinhibitor Imatinib (Gleevec).

It is therefor not surprising that there is a great interest in the study of Phosphoproteins. Today, phosphoproteomics is probably the most established area of research that investigates PTMs but other modifications are gaining increasing interest.

Another reason, why phosphopeptides are widely researched is of technical nature: Phosphopeptides are easy to enrich - especially if compared to other modifications. This allows to investigate site-specific phosphorylation, which is often substoichiometric (represented only by a small proportion of all peptides in a complete cell lysate). Due to the strong nature of changing a proteins conformation and pI, phosphorylations can be already investigated by long established low tech methods as SDS-PAGE. The phosphorylation replaces the neutral hydroxyl groups on serines, threonines, or tyrosines with negatively charged phosphates (HnPO3−4 ) with pKs at 12.67, 7.21 and 2.12 for n=1,2,3. Thus acting as weak acids, they add negative charges to the protein in solution/gel introducing a band/spot duplication with slight shifted position. Mass spectrometry based phosphoproteomics however give access to high throughput methods of detecting and localisation of phosphorylations. Together with preparative methods, these will be discussed in the following chapters.

PTM enrichment As previously mentioned, PTMs mostly occur only in sub-stochiometric abundancies. Hence enrichment methods are needed the supply sufficient amounts of analyte to reach the mass spectrometers sensitivity threshold.

Most important enrichment methods are: affinity-based methods (e.g. IMAC, TiO2 or strong cation exchange) antibody-based methods (e.g. anti-phosphotyrosine enrichment) Immobilized metal-affinity chromatography (IMAC) Phosphates have high affinity to trivalent metal ions like Fe3+, Ga3+, Al3+ and Zr3+. These metal ions are immobilized on columns and allow to separate phosphopeptides. A problem or difficulty of IMAC is, that non-phosphopeptides bind if the pH during loading is not between [2,3.5]. Another problem is, that strongly acidic peptides (rich in E and D) are also affine to the metal complexes.

Titan dioxide (TiO2 ) enrichment Organic phosphates are effectively adsorbed to TiO2 in acidic conditions and desorbed in alkaline conditions. Unphosphorilated peptides are much more abundant and still can bind to TiO2 to a certain degree. To reduce this unspecific binding dihydroxy benzoic acid (DHB) is added. It competitivly binds to TiO2 and prevents the adsorbtion of unphosphorilated peptides. This greatly enhances selectivity.

Strong cation exchange Strong cation exchange uses the charge to distinguish phosphorilated and unphosphorilated peptides. At pH 2.7 a (typical) tryptic peptide has charge z = +2 (N-terminal amine group + C-terminal K or R). If it gets phosphorilated one negative charge is added, which in this case would result in z= +1. Using a linear salt gradient the phosphopeptides can be enriched in early SCX fraction. One potential problem is, that multiply phosphorylated peptides (neutral or negative charge) will be in the flow through.

Antibody enrichment Immunopurification with immobilized anti-phospho-Y antibodies. Antibody-based method is well established for Y, but not for other residues. It has limited in throughput and is hard to automate.

Phosphosite detection MS techniques for phosphopeptide detection

Detection of phosphopeptides by MS is difficult because: Phosphopeptides are very low abundant They have low MS response values They show inadequate fragmentation patterns That is why alternative methods emerged: Precursor ion scanning and reporter ions Neutral loss dependent MSn Alternative fragmentation methods (e.g ETD) Precursor / reporter ion scanning Triple-Q instruments can be used to detect diagnostic fragment ions at m/z 79 (HPO−3 ) using precursor ion scanning in negative mode. The method is very sensitive, however fragment spectra recorded in negative mode are of poor quality is slower as switching between positive and negative ionization modes takes time For the analysis of Y-phosphorylation, the reporter ion scanning method is used to detect the pY immonium ion (cleavage at either side of pY) at m/z 216.043 (very specific for pY!) The pY immonium ion is mass deficient due to the high content of O and P, thus high resolution instruments can easily distinguish between ions with the same nominal mass e.g. from other fragments

Neutral loss dependent MSn

Fragmentation (using CID) occurs via the lowest energy dissociation pathway (e.g. the O-P bond in S and T phosphorylated peptides). Often a poor coverage and low intensity of peptide backbone ions in the MS/MS spectrum is observed. In contrast, a neutral loss peak clearly stands. On modern mass spectrometers an additional MS event (MS3) can be triggered at the neutral loss peak [m/z precursor – neutral loss (98 Da)/z].

Identification Determining which amino acids is phosphorylated can be difficult. Search engines often suggest correctly the presence of a phosphopeptide, but fail to localize the phosphorylation site, if more than one residue can potentially carry the phosphate group.

Reminder: In eukaryotes phosphate groups are predominately attached the S, T and Y residues (other phosphorylated amino acids exists (H,L or R), but are very rare)

The information on the phosphorylation site is contained in the so called site-determining ions. Only these ions show a mass shift in the MS2 that can be used to determine the correct phosphorylation site. AScore algorithm The AScore (Ambiguity Score) algorithm is one of the first algorithms addressing the problem of modification site localization. It consists of two main steps: Determine the most likely site locations Use site-determining ions to calculate the probability for correct assignment Example: 13 residue peptide from Zinc finger protein 638: QSSVTQVTEQSPK Our first observation is, that multiple possibilities for phosphorylation site assigment are possible if we allow for phosphorylated S,T and Y in our search. Now consider that the two best but identical scoring hits returned by the search engine are: QSSVTQVpTEQSPK and QSSVTQVTEQpSPK Which one is the correct phosphorylation site?

Step 1: Determine the most likely phosphorylation site MS/MS spectrum is separated into windows of 100 m/z units Retain i, with i <=10, of the most intense peaks per spectrum to obtain a spectrum of peak depth i (per 100 Da). Predicted b- and y- ions (the theoretical spectrum of b- and y- ions) are then matched against the spectrum, P calculated for all 10 peak depths The cumulative binomial probability P is then calculated using the number of trials N, the number of successes n and the probability of success p for every peak depth: image

P is the probability for random matchings of the given number of fragment ions; the total number of trials (N) equals the number of predicted fragment ions for the peptide; the number of successes (n) is the number of matches of predicted against the respective spectrum; p is i/100 within a given window of depth i.

The 10 scores (one for each peak depth) of a phosphorilated peptide and is then negative, log transformed to: Scorei=−10log10(P)

Step 2: Use site-determining ions to calculate the probability for correct assignment. In the case of a single phosphorilation site, the best and second best scoring phospho site assignment are determined based on an weighted petide score (linear weighting of all 10 scores results in a single peptide score) Determine site-determining ions between the best and second best site. Use cumulative binomial probability again, but this time only on the site-determining ions. Take the peak depth that results in the maximum difference between both site assignments. This is the filtering that leads to the besti discrimination, i.e. retains true peaks but filters noise. Negative, log transform both scores. The AScore is the difference between both scores. A high AScore means, that there is little evidence in the spectrum (peaks) that support a different localization of the phospho site. If more than one site is phosphorylated the algorithm is a bit more complicated but the principle remains the same.

Implemented in OpenMS

timosachsenberg avatar Mar 09 '23 15:03 timosachsenberg