krakenuniq
krakenuniq copied to clipboard
Discrepant results including viral neighbors references
Dear program authors and users,
I ran krakenuniq on a database made only by the RefSeq Viral sequences (krakenuniq-download --db ${outdir} -threads 26 --dust refseq/viral/Any krakenuniq-build --db ${outdir} --kmer-len 31 --threads 26 --taxids-for-genomes --taxids-for-sequences)
In the results I look for species over a threshold k-mer or coverage and I look at the assigned reads to identify the most plausible genome sequence of the identified species. Specifically this example
Most of the alphapapillomavirus 7 reads are mapped onto NC_001357.1, so I would consider this a valid genome reference.
I noticed that analysing the same reads using a database that includes also the Genebank viral neighbors (krakenuniq-download --db ${outdir} -threads 26 --dust refseq/viral/Any viral-neighbors krakenuniq-build --db ${outdir} --kmer-len 31 --threads 26 --taxids-for-genomes --taxids-for-sequences) the results are quite different:
Indeed for the same species alphapapillomavirus 7 a similar number of reads is identified but there is no sequence on which most of reads are mapped and the sequence NC_001357.1 has very few reads assigned. How this is reconcilable with the previous result?
Do you have any suggestion, am I misinterpreting the results?
Thanks in advance Luigi