kraken2 icon indicating copy to clipboard operation
kraken2 copied to clipboard

processing paired reads

Open pedres opened this issue 3 months ago • 0 comments

Hi, I was searching here and in the manual but I am not sure how to proceed. In the manual says that "--paired option to kraken2 will indicate to kraken2 that the input files provided are paired read data, and data will be read from the pairs of files concurrently." So if I want to classify a sample with paired reads and I understand that I have to pass --paired flag. However, in the "Metagenome analysis using the Kraken software suite" for the microbiome protocol the kraken2 command has not the --paired flag as it appears in the pathogen protocol. When looking at the code on https://github.com/martin-steinegger/kraken-protocol none of the kraken2 commands have the --paired flags. So, what would be the correct approach to process a set of paired reads? In fact if I run: kraken2 --db $DATABASE --memory-mapping --threads 20 --report krak_test/VCM180_paired.k2report --paired shotgun_NOVOG/VCM180_R1.fq.gz shotgun_NOVOG/VCM180_R2.fq.gz > krak_test/VCM180.kraken2 Loading database information... done. 68391295 sequences (20505.76 Mbp) processed in 390.736s (10501.9 Kseq/m, 3148.79 Mbp/m). 205168 sequences classified (0.30%) 68186127 sequences unclassified (99.70%)

kraken2 --db $DATABASE --memory-mapping --threads 20 --report krak_test/VCM180_notpaired.k2report shotgun_NOVOG/VCM180_R1.fq.gz shotgun_NOVOG/VCM180_R2.fq.gz > krak_test/VCM180.kraken2 Loading database information... done. 136782590 sequences (20505.76 Mbp) processed in 362.249s (22655.6 Kseq/m, 3396.41 Mbp/m). 197818 sequences classified (0.14%) 136584772 sequences unclassified (99.86%)

bracken -d $DATABASE -i krak_test/VCM180_notpaired.k2report -o krak_test/VCM180_notpaired.bracken -w krak_test/VCM180_notpaired.breport -r 150 -l S bracken -d $DATABASE -i krak_test/VCM180_paired.k2report -o krak_test/VCM180_paired.bracken -w krak_test/VCM180_paired.breport -r 150 -l S

When not setting --paired flag kraken2 classifies the double amount of sequences (separately classify R1 and R2 fastq files) and this affects counts for k2report and bracken estimation

Thank you very much for your help VCM180_notpaired_bracken.txt VCM180_notpaired_k2report.txt VCM180_paired_bracken.txt VCM180_paired_k2report.txt

Manuel

pedres avatar Nov 27 '24 13:11 pedres