MitoFinder
MitoFinder copied to clipboard
Missing gene - blastn evalue rounded to 0.0
I am running mitofinder on assembled contigs to search for mtDNA/Genomes. However, I am missing CO1 from the output. I thought maybe I just didn't capture that mtGene. However, to confirm my suspicions, I ran blastn using the same 5 reference genes and found a contig or two that should contain that gene (three hits actually, but the longest blast alignment fits the expected length of 1.5k bp). For some reason the tmp files directory generated by mitofinder has the CO1 blast output empty (actually all of them are) so I cannot check the blast scores used by mitofinder.
Here is an example of my sanity check where all CO1 evalues are 0.0
contig_2018 NC_036507.1@COX1 91.063 884 77 2 1 883 364 1246 0.0 1194
contig_2018 NC_046469.1@COX1 90.611 884 81 2 1 883 364 1246 0.0 1171
contig_2018 NC_016469.1@COX1 90.498 884 82 2 1 883 364 1246 0.0 1166
contig_2018 NC_018339.1@COX1 89.581 883 92 0 1 883 355 1237 0.0 1122
contig_2018 NC_044759.1@COX1 89.052 886 91 3 1 883 364 1246 0.0 1094
...
contig_1535 NC_036507.1@COX1 90.537 1173 102 4 15804 16968 1 1172 0.0 1543
contig_1535 NC_044759.1@COX1 89.608 1174 115 6 15804 16970 1 1174 0.0 1485
contig_1535 NC_016469.1@COX1 89.632 1167 112 4 15804 16962 1 1166 0.0 1476
contig_1535 NC_046469.1@COX1 89.632 1167 112 4 15804 16962 1 1166 0.0 1476
contig_1535 NC_018339.1@COX1 89.261 1164 116 5 15813 16968 1 1163 0.0 1448
...
contig_2876 NC_016469.1@COX1 90.944 1546 138 2 3549 5093 1 1545 0.0 2078
contig_2876 NC_036507.1@COX1 90.980 1541 137 2 3549 5088 1 1540 0.0 2074
contig_2876 NC_046469.1@COX1 90.962 1538 137 2 3549 5085 1 1537 0.0 2069
contig_2876 NC_044759.1@COX1 90.201 1541 145 3 3549 5086 1 1538 0.0 2004
contig_2876 NC_018339.1@COX1 90.137 1531 151 0 3558 5088 1 1531 0.0 1991
My understanding is that blast will round a very small e-value to 0 if its a good match.
However, I noticed that this could be an issue with the way the max value is added in the python script: line 84 mitofinder
parser.add_argument('-e', '--blast-eval', help='e-value of blast program used for contig identification and annotation. Default = 0.00001', type=float,
default=0.00001, dest='blasteVal')
I was also missing some other genes before adding mtDNA reference from a sister genus, which I think might explain why some of the genes were missing. Here is a good example, ND4: ~ 1.3k bp reference genes. Which now appear with near zero evalues for the longest alignment length contig.
contig_2359 NC_016469.1@ND4 90.280 751 73 0 259262 260012 1290 540 0.0 983
contig_2359 NC_036507.1@ND4 90.280 751 73 0 259262 260012 1290 540 0.0 983
contig_2359 NC_046469.1@ND4 89.614 751 78 0 259262 260012 1290 540 0.0 955
contig_2359 NC_044759.1@ND4 89.418 756 76 4 259262 260017 1290 539 0.0 950
contig_2359 NC_018339.1@ND4 89.390 754 78 2 259262 260015 1290 539 0.0 948
...
contig_3886 NC_016469.1@ND4 89.389 311 32 1 837341 837651 839 1148 1.64e-107 390
contig_3886 NC_018339.1@ND4 89.389 311 32 1 837341 837651 839 1148 1.64e-107 390
contig_3886 NC_036507.1@ND4 89.644 309 29 3 837341 837648 839 1145 1.64e-107 390
contig_3886 NC_046469.1@ND4 88.746 311 34 1 837341 837651 839 1148 3.56e-104 379
contig_3886 NC_044759.1@ND4 87.539 321 36 4 837335 837653 832 1150 7.70e-101 368
I would normally try tweaking the code myself, but I am using the singularity container and don't know how to edit anything there.
Any advice would be greatly appreciated!
To clarify, I am using oxford nanopore output, so this might be part of the issue. However, I have kept the max length for contigs as default (25k bp); so I do not think it would be in conflict with the framework of using mitofinder for UCE assemblies. It may still not be appropriate, but I thought you would want to be made aware of this issue.
system information:
ubuntu server, mitofinder version: mitofinder_v1.4.2.sif, singularity version: 3.7.1
for my own blastn: BLAST 2.12.0+