exonerate icon indicating copy to clipboard operation
exonerate copied to clipboard

Unclear effect of the protein2genome model

Open cmayer opened this issue 3 years ago • 1 comments

The manual says: "protein2genome This model allows alignment of a protein sequence to genomic DNA. This is similar to the protein2dna model, with the addition of modelling of introns and intron phases. This model is simliar to those used by genewise."

I could not identify any difference between the protein2genome and protein2dna models.

I was wondering what to use in the case of data that should contain mostly coding sequences, but could contain introns, UTRs and anything beyond the genes. E.g. for hybrid enrichment data for which the bait region lies within the genes, but sequences could span beyond the coding region. Here, modeling the introns could help in principle. As far as I understand the manual, the protein2genome should be favoured for the described scenario. How are introns "modeled" in the two protein2dna and protein2genome cases.

cmayer avatar Aug 12 '21 21:08 cmayer

I believe protein2genome is incorporates the model with intron states while protein2dna is more about only modeling frameshifts in a protein to DNA alignment.

https://github.com/nathanweeks/exonerate/blob/master/doc/man/man1/exonerate.1

protein2dna
This model compares a protein sequence to a DNA sequence,
incorporating all the appropriate gaps and frameshifts.

This is a bestfit version of the protein2dna model,
with which the entire protein is included in the alignment.
It is currently only available when using exhaustive alignment.

protein2genome
This model allows alignment of a protein sequence to genomic
DNA.   This is similar to the protein2dna model,
with the addition of modelling of introns and intron phases.
This model is simliar to those used by genewise.

hyphaltip avatar Feb 06 '22 22:02 hyphaltip