PHANOTATE
PHANOTATE copied to clipboard
Truncated proteins at genome end
Hi, I am encountering cases where there are truncated proteins (missing stop codon) called by phanotate at the genome end. The common case was when there was another protein on the other side of the contig which was missing a start codon. Thus, when rotating the genome, a complete CDS would be found. However, after rotating the genome, I still see rare cases of truncated proteins at genome end for which I can't find a logical continuation on the other side of the contig. Is this the intended behavior? Should I discard these proteins in post processing? Thanks, Ilya.
It is standard for all gene callers to extend the ORFs off the ends, even without checking the other end for a possible connecting ORF:
$ prodigal -i NC_006820.fna | grep CDS | tail -n5 CDS 194317..194925 CDS 194897..195151 CDS 195163..195336 CDS 195308..195595 CDS 195718..>196278
However phanotate should be adding the chevrons to the locations to indicate that they extend off the ends:
$ phanotate.py NC_006820.fna -f genbank | grep CDS | tail -n5 CDS 194317..194925 CDS 194897..195151 CDS 195163..195336 CDS 195305..195595 CDS 195718..196278
Which is a bug I will need to fix for the current main branch.
In version 2.0 I may add the ability to provide a command argument to only include genes that extend off the ends if a -c --circular flag is given