BOLDigger icon indicating copy to clipboard operation
BOLDigger copied to clipboard

Filter possible NUMTS from BOLDigger assignments

Open AlvaroFueyo opened this issue 7 months ago • 1 comments

It is possible that in the metabarcoding amplification process not only the mitochondrial gene but also nuclear copies of that gene were amplified. And they can lead to false positive detection and identification in the BOLDigger and JAMP pipeline.

These copies are marked as UNVERIFIED by NCBI in its database (GenBank) if it detects internal codons or INDELs (in the internal sequences of genes where there should not be any).

It could be not so hard to incorporate into BOLDigger the detection of these sequences by adding a new FLAG if internal stopcodons are detected in the sequences after assigning them (given that the stopcodons depend on the taxonomic group to which each OTU belongs).

Thank you very much for this wonderful pipeline, best regards!

AlvaroFueyo avatar Nov 28 '23 13:11 AlvaroFueyo