Rsamtools icon indicating copy to clipboard operation
Rsamtools copied to clipboard

Rsamtools::pileup counts overlapping paired-end reads twice

Open alkodsi opened this issue 6 years ago • 5 comments

Is there a way to disable this behavior?

alkodsi avatar Oct 10 '19 11:10 alkodsi

can you clarify the problem and indicate the code you are using?

mtmorgan avatar Oct 10 '19 12:10 mtmorgan

The problem is when paired end reads are overlapping, such as the example below:

Read1 (first pair):  ------------------ACGT
Read1 (second pair):                   ACGT--------------------

the pileup function will count the overlapping parts twice although they belong to the same fragment.

A related old issue in samtools: https://www.biostars.org/p/87299/

The code I used:

gr <- GenomicRanges::GRanges(chr, IRanges::IRanges(pos, pos))
sbp <- Rsamtools::ScanBamParam(which = gr)
pileupParam <- Rsamtools::PileupParam(max_depth = 100000, min_base_quality = 20,  min_mapq = 30, distinguish_strands = F, include_deletions = F, include_insertions = F)
    
p <- Rsamtools::pileup(bam, scanBamParam = sbp, pileupParam = pileupParam)

alkodsi avatar Oct 10 '19 12:10 alkodsi

I am using Rsamtools v2.8.0 and pileup still double counts the overlapping section of read pairs. Can I request that it mimic the behavior of "samtools mpileup" which by default uses overlap detection and counts it only once.

veseshan avatar Oct 08 '21 15:10 veseshan

Hi,

Rsamtools::pileup() was implemented years ago (in 2014) by a Bioconductor core team member (Nathaniel Hayden) who has left the team since then. It was a significant effort i.e. Nathaniel spent a couple of months on it and implemented most of it in C/C++. Unfortunately the core team does not have the resources at the moment to fix the double counting problem but a PR would be welcome.

Best, H.

hpages avatar Oct 08 '21 18:10 hpages

Thank you

veseshan avatar Oct 12 '21 13:10 veseshan