kaiju
kaiju copied to clipboard
validity (QC) of prediction files based on a custom reference database
Quick question: The resulting prediction file using Kaiju to predict reference genomes of the reads in paired-end FASTQ files includes the taxa IDs from the reference database for some reads and from other NCBI taxa IDs for some other reads (both groups of reads marked as classified:C). There is also a third group as unclassified reads. I am just not sure if this could be regarded as a valid prediction file as I was expecting the reads to be either classified to the taxa IDs from the custom reference database or returened as unclassified. I could not find an example of it in this repository. Could you please elaborate on this? Thank you
Check out what your "other NCBI taxa IDs" are. Note how kaiju computes the LCA of taxa with equally good matches.
from what I understand reads can be classified to taxa IDs that are NOT present in a custom reference database. So such a result could be valid, right?