nanopolish
nanopolish copied to clipboard
discrepancy in read counts between nanopolish index and eventalign
Hi Jared,
As requested I open a new issues due to the discrepancy between the "read count" output from nanopolish index and eventalign commands. Here is an example from one of my datasets, but I find this same issue in all of them:
Nanopolish index output: [readdb] indexing /media/data/nuria/nanopore/raw/mouse/sk_raw [readdb] num reads: 3074619, num reads with path to fast5: 3074619
Nanopolish eventalign output from the same sample: [post-run summary] total reads: 940575, unparseable: 0, qc fail: 18674, could not calibrate: 7503, no alignment: 645, bad fast5: 0
From what I've understood from another issue I've read, the way that the tool is counting the reads is different. As far as I understood it, Nanopolish index gives me the "real" number of reads that I've sequenced. What I don't understand is what exactly is "total reads" after the Nanopolish eventalign command. As you can see here it looks like I go from 3000 K to less than 950 K, so I am wondering why.
Thank you so much :) Núria
Hi Nuria,
From what I've understood from another issue I've read, the way that the tool is counting the reads is different. As far as I understood it, Nanopolish index gives me the "real" number of reads that I've sequenced.
This is correct
What I don't understand is what exactly is "total reads" after the Nanopolish eventalign command.
It is the number of reads that eventalign tried to process. It will be equal to the number of reads that are mapped to the reference genome in the region that you specified (or the entire genome, if you did not specify a region). Can you count how many reads map to the reference and see if the number matches what eventalign reports?
Thanks, Jared
Hi,
Thanks for your answer. When running samtools flagstats on the corresponding sam file I get the following output:
3077261 + 0 in total (QC-passed reads + QC-failed reads) 3074619 + 0 primary 0 + 0 secondary 2642 + 0 supplementary 0 + 0 duplicates 0 + 0 primary duplicates 470309 + 0 mapped (15.28% : N/A) 467667 + 0 primary mapped (15.21% : N/A) 0 + 0 paired in sequencing 0 + 0 read1 0 + 0 read2 0 + 0 properly paired (N/A : N/A) 0 + 0 with itself and mate mapped 0 + 0 singletons (N/A : N/A) 0 + 0 with mate mapped to a different chr 0 + 0 with mate mapped to a different chr (mapQ>=5)
The reported mapped reads by flagstats and the ones from eventalign aren't quite matching.
Thanks. Núria
Could you try:
samtools view -F 4 alignments.bam | wc
sure, gives me this output: 470309 9411409 788813891
Are you using the latest version of nanopolish?
I'm using nanopolish 0.13.2, the one that got downloaded via conda when I installed the environment in May this year. I've just double checked and Nanopolish got updated to 0.14.0 6 hours ago in the conda repository. Is this fixed in the new version?
I still don't know the root cause of this issue but I'd appreciate if you could run 0.14.0 and let me know if you get the same count.
Jared
Actually I think it is fixed in 0.14.0. Reviewing the changes between versions, I remember now that there was a double-counting of reads in 0.13.2. It doesn't effect the result, only the final status line. Sorry for the trouble.
https://github.com/jts/nanopolish/commit/7fde4f482f5c0a04fd135178dc83be91c16fe85d