samtools icon indicating copy to clipboard operation
samtools copied to clipboard

Overestimation of number of reads from nanopore data (flagstat)

Open rebeelouise opened this issue 4 years ago • 0 comments

Same issue as mentioned on the minimap2 tool: https://github.com/lh3/minimap2/issues/236#issue-361097444

For example nanopore reads aligned to the host transcriptome the flagstat output is:

5953480 + 0 in total (QC-passed reads + QC-failed reads) 2961480 + 0 secondary 22696 + 0 supplementary 0 + 0 duplicates 4195469 + 0 mapped (70.47% : N/A) 0 + 0 paired in sequencing 0 + 0 read1 0 + 0 read2 0 + 0 properly paired (N/A : N/A) 0 + 0 with itself and mate mapped 0 + 0 singletons (N/A : N/A) 0 + 0 with mate mapped to a different chr 0 + 0 with mate mapped to a different chr (mapQ>=5)

However, the number of actual reads is: 2969304 - the read length of these are about 750nt. I am assuming this over reporting is due to the presence of long reads, is there a more appropriate way of calculating the number of reads and the % of reads mapped in an alignment file? Can the % of reads mapped still be a trusted value?

rebeelouise avatar Feb 19 '21 14:02 rebeelouise