bioperl-live icon indicating copy to clipboard operation
bioperl-live copied to clipboard

Start codon handling in 1.7.x

Open nerdstrike opened this issue 7 years ago • 8 comments

The following fix was applied to 1.7 releases to deal with NNN mapping to a stop codon.

https://github.com/bioperl/bioperl-live/commit/260ebb98281ce5580c9c1b9808d3167688c7626c

Ensembl also encounters unexpected behaviour around the handling of start codons, where Bio::Tools::CodonTable::is_start_codon('NNN') returns true. The cause is the same, i.e. the inversion of default behaviour, and so the solution should be pretty similar I hope!

I believe this also applies to 'NN' and other non-triplet equivalents too.

nerdstrike avatar Jan 30 '18 11:01 nerdstrike

@nerdstrike have any code or examples we can test around? This generally helps 'speed' things along.

cjfields avatar Feb 07 '18 21:02 cjfields


use strict;
use Bio::Tools::CodonTable;

my $table = Bio::Tools::CodonTable->new();

for my $codon (qw/NNN NN N ATG TAG TAA TGA/) {
  if ($table->is_start_codon($codon)) {
    print "$codon is a start codon\n";
  } elsif ($table->is_ter_codon($codon))  {
    print "$codon is a stop codon\n";
  } else {
    print "$codon is nonsense\n";
  }
}

Output from 1.2.3, 1.6.924,

NNN is nonsense NN is nonsense N is nonsense ATG is a start codon TAG is a stop codon TAA is a stop codon TGA is a stop codon

Output from 1.7.2,

NNN is a start codon <----------- NN is nonsense N is nonsense ATG is a start codon TAG is a stop codon TAA is a stop codon TGA is a stop codon

It's a niche problem, but occurs if you're fishing for start codons and don't know you're in a masked region or out of phase.

Also applies to NTG. It could be, yes, but the answer should be "maybe" at best.

nerdstrike avatar Feb 08 '18 10:02 nerdstrike

NTG = maybe
p=0.25 for eukaryotes p>0.25 for bacteria :-)

tseemann avatar Feb 15 '18 07:02 tseemann

Yep. I’ve seen all of those, including CTG. I’m sure you have as well ;-)

On Feb 15, 2018, at 1:00 AM, Torsten Seemann <[email protected]mailto:[email protected]> wrote:

NTG = maybe p=0.25 for eukaryotes p>0.25 for bacteria :-)

cjfields avatar Feb 15 '18 18:02 cjfields

Not just bacteria

http://www.jbc.org/content/285/7/4595.full http://genome.cshlp.org/content/28/1/25.full

Curious about the incidence of CTG in bacteria with experimental confirmation. Citation?

jimhu-tamu avatar Feb 15 '18 18:02 jimhu-tamu

Been a while but I think something from either Mycobacterium or Streptomyces.

Not just bacteria

http://www.jbc.org/content/285/7/4595.full http://genome.cshlp.org/content/28/1/25.full

Curious about the incidence of CTG in bacteria with experimental confirmation. Citation?

cjfields avatar Feb 15 '18 18:02 cjfields

Whoops. Didn't realize Github would let me edit your reply when I meant to edit mine.

My colleague told me that CTG starts are known in phage T4.

jimhu-tamu avatar Feb 15 '18 18:02 jimhu-tamu

I suppose I should clarify that my issue arises on the default codon table, but could nonetheless arise on alternate tables too.

nerdstrike avatar Feb 16 '18 15:02 nerdstrike