pyrodigal icon indicating copy to clipboard operation
pyrodigal copied to clipboard

Inconsistent start score computed for some genes

Open althonos opened this issue 3 years ago • 1 comments

While adding some tests to check for the GFF output (in order to fix #18) I noticed that the start score of some genes were deviating from the Prodigal reference results. This was not verified before since the GFF format is the only output format to contain these statistics. This change in start score affects the may score and the confidence of each gene marginally.

Genes scored with Prodigal:

NODE_23_length_79939_cov_26.984653	Prodigal_v2.6.3	CDS	1	177	8.4	-	0	ID=1_1;partial=10;start_type=ATG;rbs_motif=None;rbs_spacer=None;gc_cont=0.237;conf=90.13;score=9.62;cscore=10.74;sscore=-1.12;rscore=-5.22;uscore=-1.07;tscore=3.94;
NODE_23_length_79939_cov_26.984653	Prodigal_v2.6.3	CDS	168	386	25.1	-	0	ID=1_2;partial=00;start_type=ATG;rbs_motif=None;rbs_spacer=None;gc_cont=0.251;conf=99.77;score=26.33;cscore=27.03;sscore=-0.70;rscore=-6.04;uscore=0.68;tscore=3.41;
NODE_23_length_79939_cov_26.984653	Prodigal_v2.6.3	CDS	389	1483	186.7	-	0	ID=1_3;partial=00;start_type=ATG;rbs_motif=GGAGG;rbs_spacer=5-10bp;gc_cont=0.254;conf=99.99;score=186.70;cscore=168.23;sscore=18.47;rscore=14.49;uscore=0.04;tscore=3.94;
NODE_23_length_79939_cov_26.984653	Prodigal_v2.6.3	CDS	1632	2981	218.9	-	0	ID=1_4;partial=00;start_type=ATG;rbs_motif=AGGAGG;rbs_spacer=3-4bp;gc_cont=0.296;conf=99.99;score=218.26;cscore=200.52;sscore=17.74;rscore=14.49;uscore=-0.04;tscore=3.94;
NODE_23_length_79939_cov_26.984653	Prodigal_v2.6.3	CDS	3569	3925	25.5	+	0	ID=1_5;partial=00;start_type=ATG;rbs_motif=GGA/GAG/AGG;rbs_spacer=5-10bp;gc_cont=0.266;conf=99.72;score=25.49;cscore=21.09;sscore=4.41;rscore=1.46;uscore=-1.00;tscore=3.94;

Genes scored with Pyrodigal v0.6.4:

NODE_23_length_79939_cov_26.984653_1	pyrodigal_v0.6.4	CDS	1	177	8.4	-	0	ID=1_1;partial=10;start_type=ATG;rbs_motif=None;rbs_spacer=None;gc_cont=0.237;conf=90.13;score=9.62;cscore=10.74;sscore=-1.12;rscore=-5.22;uscore=-1.07;tscore=3.94;
NODE_23_length_79939_cov_26.984653_2	pyrodigal_v0.6.4	CDS	168	386	25.1	-	0	ID=1_2;partial=00;start_type=ATG;rbs_motif=None;rbs_spacer=None;gc_cont=0.251;conf=99.77;score=26.33;cscore=27.03;sscore=-0.70;rscore=-6.04;uscore=0.68;tscore=3.41;
NODE_23_length_79939_cov_26.984653_3	pyrodigal_v0.6.4	CDS	389	1483	186.7	-	0	ID=1_3;partial=00;start_type=ATG;rbs_motif=GGAGG;rbs_spacer=5-10bp;gc_cont=0.254;conf=99.99;score=186.70;cscore=168.23;sscore=18.47;rscore=14.49;uscore=0.04;tscore=3.94;
NODE_23_length_79939_cov_26.984653_4	pyrodigal_v0.6.4	CDS	1632	2981	218.9	-	0	ID=1_4;partial=00;start_type=ATG;rbs_motif=AGGAGG;rbs_spacer=3-4bp;gc_cont=0.296;conf=99.99;score=218.26;cscore=200.52;sscore=17.74;rscore=14.49;uscore=-0.04;tscore=3.94;
NODE_23_length_79939_cov_26.984653_5	pyrodigal_v0.6.4	CDS	3569	3925	25.5	+	0	ID=1_5;partial=00;start_type=ATG;rbs_motif=GGA/GAG/AGG;rbs_spacer=5-10bp;gc_cont=0.266;conf=99.72;score=25.49;cscore=21.09;sscore=4.41;rscore=1.46;uscore=-1.00;tscore=3.94;

Genes scored with Pyrodigal v1.1.2:

NODE_23_length_79939_cov_26.984653_1	pyrodigal_v1.1.2	CDS	1	177	8.4	-	0	ID=1_1;partial=10;start_type=ATG;rbs_motif=None;rbs_spacer=None;gc_cont=0.237;conf=87.32;score=8.39;cscore=10.74;sscore=-2.35;rscore=-5.22;uscore=-1.07;tscore=3.94
NODE_23_length_79939_cov_26.984653_2	pyrodigal_v1.1.2	CDS	168	386	25.1	-	0	ID=1_2;partial=00;start_type=ATG;rbs_motif=None;rbs_spacer=None;gc_cont=0.251;conf=99.69;score=25.07;cscore=27.03;sscore=-1.96;rscore=-6.04;uscore=0.68;tscore=3.41
NODE_23_length_79939_cov_26.984653_3	pyrodigal_v1.1.2	CDS	389	1483	186.7	-	0	ID=1_3;partial=00;start_type=ATG;rbs_motif=GGAGG;rbs_spacer=5-10bp;gc_cont=0.254;conf=99.99;score=186.70;cscore=168.23;sscore=18.47;rscore=14.49;uscore=0.04;tscore=3.94
NODE_23_length_79939_cov_26.984653_4	pyrodigal_v1.1.2	CDS	1632	2981	218.9	-	0	ID=1_4;partial=00;start_type=ATG;rbs_motif=AGGAGG;rbs_spacer=3-4bp;gc_cont=0.296;conf=99.99;score=218.91;cscore=200.52;sscore=18.39;rscore=14.49;uscore=-0.04;tscore=3.94
NODE_23_length_79939_cov_26.984653_5	pyrodigal_v1.1.2	CDS	3569	3925	25.5	+	0	ID=1_5;partial=00;start_type=ATG;rbs_motif=GGA/GAG/AGG;rbs_spacer=5-10bp;gc_cont=0.266;conf=99.72;score=25.49;cscore=21.09;sscore=4.41;rscore=1.46;uscore=-1.00;tscore=3.94

After bissecting, I found that the bug was introduced between v0.6.4 and v0.7.0.

althonos avatar Oct 21 '22 20:10 althonos

It looks like the bug may be coming from a weird Prodigal behaviour, and only occurs in metagenomic mode.

In the original Prodigal code, the gene data string is created right when the best genes are identified but the nodes may be changed after that, so there is a discrepancy between the gene data string and the actual start node attributes. This only occurs for genes that have been corrected with eliminate_bad_genes.

althonos avatar Oct 21 '22 23:10 althonos