nanopolish icon indicating copy to clipboard operation
nanopolish copied to clipboard

discrepancy in read counts between nanopolish index and eventalign

Open NuriaDiaz opened this issue 1 year ago • 8 comments

Hi Jared,

As requested I open a new issues due to the discrepancy between the "read count" output from nanopolish index and eventalign commands. Here is an example from one of my datasets, but I find this same issue in all of them:

Nanopolish index output: [readdb] indexing /media/data/nuria/nanopore/raw/mouse/sk_raw [readdb] num reads: 3074619, num reads with path to fast5: 3074619

Nanopolish eventalign output from the same sample: [post-run summary] total reads: 940575, unparseable: 0, qc fail: 18674, could not calibrate: 7503, no alignment: 645, bad fast5: 0

From what I've understood from another issue I've read, the way that the tool is counting the reads is different. As far as I understood it, Nanopolish index gives me the "real" number of reads that I've sequenced. What I don't understand is what exactly is "total reads" after the Nanopolish eventalign command. As you can see here it looks like I go from 3000 K to less than 950 K, so I am wondering why.

Thank you so much :) Núria

NuriaDiaz avatar Jul 21 '22 17:07 NuriaDiaz

Hi Nuria,

From what I've understood from another issue I've read, the way that the tool is counting the reads is different. As far as I understood it, Nanopolish index gives me the "real" number of reads that I've sequenced.

This is correct

What I don't understand is what exactly is "total reads" after the Nanopolish eventalign command.

It is the number of reads that eventalign tried to process. It will be equal to the number of reads that are mapped to the reference genome in the region that you specified (or the entire genome, if you did not specify a region). Can you count how many reads map to the reference and see if the number matches what eventalign reports?

Thanks, Jared

jts avatar Jul 21 '22 17:07 jts

Hi,

Thanks for your answer. When running samtools flagstats on the corresponding sam file I get the following output:

3077261 + 0 in total (QC-passed reads + QC-failed reads) 3074619 + 0 primary 0 + 0 secondary 2642 + 0 supplementary 0 + 0 duplicates 0 + 0 primary duplicates 470309 + 0 mapped (15.28% : N/A) 467667 + 0 primary mapped (15.21% : N/A) 0 + 0 paired in sequencing 0 + 0 read1 0 + 0 read2 0 + 0 properly paired (N/A : N/A) 0 + 0 with itself and mate mapped 0 + 0 singletons (N/A : N/A) 0 + 0 with mate mapped to a different chr 0 + 0 with mate mapped to a different chr (mapQ>=5)

The reported mapped reads by flagstats and the ones from eventalign aren't quite matching.

Thanks. Núria

NuriaDiaz avatar Jul 21 '22 17:07 NuriaDiaz

Could you try:

samtools view -F 4 alignments.bam | wc

jts avatar Jul 21 '22 17:07 jts

sure, gives me this output: 470309 9411409 788813891

NuriaDiaz avatar Jul 21 '22 17:07 NuriaDiaz

Are you using the latest version of nanopolish?

jts avatar Jul 21 '22 18:07 jts

I'm using nanopolish 0.13.2, the one that got downloaded via conda when I installed the environment in May this year. I've just double checked and Nanopolish got updated to 0.14.0 6 hours ago in the conda repository. Is this fixed in the new version?

NuriaDiaz avatar Jul 21 '22 18:07 NuriaDiaz

I still don't know the root cause of this issue but I'd appreciate if you could run 0.14.0 and let me know if you get the same count.

Jared

jts avatar Jul 21 '22 18:07 jts

Actually I think it is fixed in 0.14.0. Reviewing the changes between versions, I remember now that there was a double-counting of reads in 0.13.2. It doesn't effect the result, only the final status line. Sorry for the trouble.

https://github.com/jts/nanopolish/commit/7fde4f482f5c0a04fd135178dc83be91c16fe85d

jts avatar Jul 21 '22 18:07 jts