kaiju icon indicating copy to clipboard operation
kaiju copied to clipboard

validity (QC) of prediction files based on a custom reference database

Open ramtinz opened this issue 3 years ago • 2 comments

Quick question: The resulting prediction file using Kaiju to predict reference genomes of the reads in paired-end FASTQ files includes the taxa IDs from the reference database for some reads and from other NCBI taxa IDs for some other reads (both groups of reads marked as classified:C). There is also a third group as unclassified reads. I am just not sure if this could be regarded as a valid prediction file as I was expecting the reads to be either classified to the taxa IDs from the custom reference database or returened as unclassified. I could not find an example of it in this repository. Could you please elaborate on this? Thank you

ramtinz avatar Jan 14 '22 08:01 ramtinz

Check out what your "other NCBI taxa IDs" are. Note how kaiju computes the LCA of taxa with equally good matches.

pmenzel avatar Jan 14 '22 12:01 pmenzel

from what I understand reads can be classified to taxa IDs that are NOT present in a custom reference database. So such a result could be valid, right?

ramtinz avatar Jan 14 '22 14:01 ramtinz