nanocomp
nanocomp copied to clipboard
NanoStats read number does not match other tools
I noticed that the number of reads does not match that of seqkit stat --all or samtool stats at various stages of filtering to then mapping basecalled reads.
For instance for sequencing_summary tells initially, I have in total (pass and fail) 36,903,362
pycoQC and Nanopore's own BasicQC report tells me 36,903,363
NanoComp on the initial concatenation of the pass FASTQs 24,768,601
BasicQC and pycoQC states 24,768,601
seqkit stats: 24,768,601
NanoComp on the pychopper output FASTQ 21,245,234
seqkit stat states 21,245,235
Nanocomp on the BAM alignment file output for mapping pychopper FASTQ 21,261,221
samtool stats states raw total sequences: 21,245,235 (matches seqkit on pychopper output) reads mapped: 21,100,586
I looked through a couple of my samples and sometimes it differed by -1 and sometimes by -2 at the FASTQ level but quite a bit different for the BAM level through NanoComp.
I wasn't sure if to put this in NanoGet or nanomath as I assume this is what NanoComp is using to get the basic library states before comparing across samples and plotting.
For the fastq files - did you try with wc -l
to just have a look at the number of lines? The off-by-one difference is interesting. Where do you see a -2 difference?
For the bam files only primary reads are retained/secondary and unaligned reads are removed.