kraken2
kraken2 copied to clipboard
kraken2 with plusPFP and plant sample
Dear all,
i have an rna-seq (PE, ss, rRNA removed) from a moss species (no genome available). I run Kraken2 (default options) with plusPFP DB (containing some moss genomes), and what confuses me that it discovers only bacterial and some fungi reads (30%), and 70% unclassified. I expected some percent to be classified to plant but strangely not the case? Am i doing something wrong?
Did you download this database from the aws site? The databases provided only use complete genomes so it could be that there is not a genome with close enough relevance to your moss species to allow it to show up.
I'm using kraken2 and the PlusPFP via galaxy EU. I thought the same, but than tried with a dataset from a papper and result was the same, only bacteria, fungi and viruses, and not a single read to plant.....strange!
Then it may be a result of how the sample was extracted and processed for sequencing
Hi, to add to this, we recently used the kraken2 PlusPFP-16 reference downloaded from: https://benlangmead.github.io/aws-indexes/k2 - we ran kraken with default parameters and found no reads classified to plant genomes. We are very confident that our samples contain plant genome reads, therefore we took 10 reads at random from our raw data, performed a BLASTn against genomes and found that all 10 randomly sampled reads aligned to plant genomes. We're not using Galaxy - but there appears to be an issue on there regarding a mix up with the databases not containing plant - could there be an issue with the pre-compiled indexes?
Yes on Galaxy (all servers) there is an issue where plusPFP is same as PlusPF so plants are missing. So they are currently investigating from where is that issue: https://github.com/galaxyproject/idc/issues/37
I run kraken2 for one RNA-seq data of some fungi, maily Colletotrichum spp., against PlusPFP (k2_pluspfp_20240112.tar.gz). Nearly half reads are unclassified! It's worst. I think the refseq genome are too small to do such classifing job, especialy there are less genome data about your studied organism.
Pity, that the PlusPFP issue still persist and it is not addressed half year so far...