pyopenms-docs Potential useful information to review and add to the readthedocs page

Potential useful information to review and add to the readthedocs page - from learning unit 7

Open timosachsenberg opened this issue 1 year ago • 3 comments

Workflow The search engine definitely plays the most important role in the peptide identification process. As illustrated below, what the search engine itself does is bascically first in silico digestion of provided protein sequences by specific database setting. second, a theoretical mass list is generated, which is subsequently compared with the experimental mass list. Finally a list of matched masses is found and ready for the statitical significance analysis.

Search parameters The search and presentation of results are controlled by specifying several search parameters. Some prominent examples are: The enzyme used for digestion. The modifications to consider: fixed modification or variable modification. (will be introduced in LU7B in details)

The mass can be specified in different ways, with and without (positive or negative) charge, monoisotopic or average mass.

Maximum number of missed cleavage sites for a peptide. The tolerance for comparing protein masses. The tolerance for comparing peptide masses. This value depends on the expected accuracy of the MS instrument; since no mass spectrometer has perfect accuracy, this parameter is always specified. And many other options.

Organization of the database In order to quickly obtain the protein sequences of interest (filtering), the database can be re-organized or with an explicit set of index tables. Each protein mass in such an index table then has pointers to the protein sequences with this mass. Another way to increase the speed relies on saving the result of in silico digestions. The theoretical masses obtained are then sorted, and each mass is provided with indices that point to the sequences in which they occur, together with some peptide information (modifications etc.).

Search engines Particularly, the peptide identification process consists of following steps: From the database, extract all sequences that fit the precursor mass of the MS2 spectrum with a given error tolerance For each of these candidates a theoretical spectrum is generated All theoretical spectra are aligned / compared to the experimental spectrum The alignments are scored and the candidates are ranked according to the score The top ranked candidate is assumed to be the correct PSM (Peptide Spectrum Matching)

Extract all candidates (search space) In this stage, an experimental spectrum S is given and we want to identify the correct sequence for S from a given protein database. Firstly, the search space for S for a given mass tolerance d is defined: m_prec is the mass of the precursor ion of spectrum S. From the database, extract all peptide sequences with mass m_cand given that |mprec−mcand|≤d.

This set of candidates is defined as the search space for spectrum S and denoted as ΩS.

Generate theoretical spectra There are two options of generating theoretical spectra. The first option is to extract all masses from the MS2 spectrum and 2nd option is trying to model fragment ion intensities. Note the generated theoretical spectrum T usually have uniform intensity information.

Comparison to experimental spectra

The main task is to compare two lists of masses, and the straightforward approach is to sort the two lists on masses and perform a parallel comparison. Some aspects that have to be taken into account are as follows: • An experimental mass may match more than one theoretical peptide mass within the given threshold. • A theoretical mass may match more than one experimental peptide. • A theoretical mass may match both an unmodified peptide and a second modified peptide. • Both a concatenated theoretical peptide (missed cleavages) and one of its parts may find matches. • Some of the experimental masses may come from noise. • Different peptides can have similar masses, due to permutations of the amino acids.

Thus for each experimental mass there can be a number of false matches (matches to other peptides than the correct one), and this number depends on the accuracy of the measurements. Scoring of peptide candidates There are numerous tools for the comparison of theoretical and experimental candidate peptides. The main difference of search engines is the implementation of the scoring schemes (resulting in differences in runtime and performance). However, conceptually all search engine algorithms are based on fragment ion comparison.

Mar 09 '23 14:03 timosachsenberg

pyopenms-docs pyopenms-docs copied to clipboard

Potential useful information to review and add to the readthedocs page - from learning unit 7

pyopenms-docs
pyopenms-docs copied to clipboard