kaiju validity (QC) of prediction files based on a custom reference database

validity (QC) of prediction files based on a custom reference database

Open ramtinz opened this issue 3 years ago • 2 comments

Quick question: The resulting prediction file using Kaiju to predict reference genomes of the reads in paired-end FASTQ files includes the taxa IDs from the reference database for some reads and from other NCBI taxa IDs for some other reads (both groups of reads marked as classified:C). There is also a third group as unclassified reads. I am just not sure if this could be regarded as a valid prediction file as I was expecting the reads to be either classified to the taxa IDs from the custom reference database or returened as unclassified. I could not find an example of it in this repository. Could you please elaborate on this? Thank you

Jan 14 '22 08:01 ramtinz

Check out what your "other NCBI taxa IDs" are. Note how kaiju computes the LCA of taxa with equally good matches.

Jan 14 '22 12:01 pmenzel

from what I understand reads can be classified to taxa IDs that are NOT present in a custom reference database. So such a result could be valid, right?

Jan 14 '22 14:01 ramtinz

kaiju kaiju copied to clipboard

validity (QC) of prediction files based on a custom reference database

kaiju
kaiju copied to clipboard