samtools
samtools copied to clipboard
Overestimation of number of reads from nanopore data (flagstat)
Same issue as mentioned on the minimap2 tool: https://github.com/lh3/minimap2/issues/236#issue-361097444
For example nanopore reads aligned to the host transcriptome the flagstat output is:
5953480 + 0 in total (QC-passed reads + QC-failed reads) 2961480 + 0 secondary 22696 + 0 supplementary 0 + 0 duplicates 4195469 + 0 mapped (70.47% : N/A) 0 + 0 paired in sequencing 0 + 0 read1 0 + 0 read2 0 + 0 properly paired (N/A : N/A) 0 + 0 with itself and mate mapped 0 + 0 singletons (N/A : N/A) 0 + 0 with mate mapped to a different chr 0 + 0 with mate mapped to a different chr (mapQ>=5)
However, the number of actual reads is: 2969304 - the read length of these are about 750nt. I am assuming this over reporting is due to the presence of long reads, is there a more appropriate way of calculating the number of reads and the % of reads mapped in an alignment file? Can the % of reads mapped still be a trusted value?