kraken2 icon indicating copy to clipboard operation
kraken2 copied to clipboard

kraken2 with plusPFP and plant sample

Open vebaev opened this issue 1 year ago • 7 comments

Dear all,

i have an rna-seq (PE, ss, rRNA removed) from a moss species (no genome available). I run Kraken2 (default options) with plusPFP DB (containing some moss genomes), and what confuses me that it discovers only bacterial and some fungi reads (30%), and 70% unclassified. I expected some percent to be classified to plant but strangely not the case? Am i doing something wrong?

vebaev avatar Dec 25 '23 18:12 vebaev

Did you download this database from the aws site? The databases provided only use complete genomes so it could be that there is not a genome with close enough relevance to your moss species to allow it to show up.

jenniferlu717 avatar Jan 08 '24 15:01 jenniferlu717

I'm using kraken2 and the PlusPFP via galaxy EU. I thought the same, but than tried with a dataset from a papper and result was the same, only bacteria, fungi and viruses, and not a single read to plant.....strange!

vebaev avatar Jan 08 '24 18:01 vebaev

Then it may be a result of how the sample was extracted and processed for sequencing

jenniferlu717 avatar Mar 15 '24 20:03 jenniferlu717

Hi, to add to this, we recently used the kraken2 PlusPFP-16 reference downloaded from: https://benlangmead.github.io/aws-indexes/k2 - we ran kraken with default parameters and found no reads classified to plant genomes. We are very confident that our samples contain plant genome reads, therefore we took 10 reads at random from our raw data, performed a BLASTn against genomes and found that all 10 randomly sampled reads aligned to plant genomes. We're not using Galaxy - but there appears to be an issue on there regarding a mix up with the databases not containing plant - could there be an issue with the pre-compiled indexes?

jamesboot avatar Mar 20 '24 15:03 jamesboot

Yes on Galaxy (all servers) there is an issue where plusPFP is same as PlusPF so plants are missing. So they are currently investigating from where is that issue: https://github.com/galaxyproject/idc/issues/37

vebaev avatar Mar 20 '24 18:03 vebaev

I run kraken2 for one RNA-seq data of some fungi, maily Colletotrichum spp., against PlusPFP (k2_pluspfp_20240112.tar.gz). Nearly half reads are unclassified! It's worst. I think the refseq genome are too small to do such classifing job, especialy there are less genome data about your studied organism.

permia avatar May 23 '24 04:05 permia

Pity, that the PlusPFP issue still persist and it is not addressed half year so far...

vebaev avatar May 23 '24 10:05 vebaev