BOLDigger
BOLDigger copied to clipboard
Filter possible NUMTS from BOLDigger assignments
It is possible that in the metabarcoding amplification process not only the mitochondrial gene but also nuclear copies of that gene were amplified. And they can lead to false positive detection and identification in the BOLDigger and JAMP pipeline.
These copies are marked as UNVERIFIED by NCBI in its database (GenBank) if it detects internal codons or INDELs (in the internal sequences of genes where there should not be any).
It could be not so hard to incorporate into BOLDigger the detection of these sequences by adding a new FLAG if internal stopcodons are detected in the sequences after assigning them (given that the stopcodons depend on the taxonomic group to which each OTU belongs).
Thank you very much for this wonderful pipeline, best regards!