nanocomp icon indicating copy to clipboard operation
nanocomp copied to clipboard

NanoStats read number does not match other tools

Open callumparr opened this issue 2 years ago • 1 comments

I noticed that the number of reads does not match that of seqkit stat --all or samtool stats at various stages of filtering to then mapping basecalled reads.

For instance for sequencing_summary tells initially, I have in total (pass and fail) 36,903,362

pycoQC and Nanopore's own BasicQC report tells me 36,903,363

NanoComp on the initial concatenation of the pass FASTQs 24,768,601

BasicQC and pycoQC states 24,768,601

seqkit stats: 24,768,601

NanoComp on the pychopper output FASTQ 21,245,234

seqkit stat states 21,245,235

Nanocomp on the BAM alignment file output for mapping pychopper FASTQ 21,261,221

samtool stats states raw total sequences: 21,245,235 (matches seqkit on pychopper output) reads mapped: 21,100,586

I looked through a couple of my samples and sometimes it differed by -1 and sometimes by -2 at the FASTQ level but quite a bit different for the BAM level through NanoComp.

I wasn't sure if to put this in NanoGet or nanomath as I assume this is what NanoComp is using to get the basic library states before comparing across samples and plotting.

callumparr avatar Apr 13 '22 05:04 callumparr

For the fastq files - did you try with wc -l to just have a look at the number of lines? The off-by-one difference is interesting. Where do you see a -2 difference?

For the bam files only primary reads are retained/secondary and unaligned reads are removed.

wdecoster avatar Apr 13 '22 20:04 wdecoster