Rsamtools
Rsamtools copied to clipboard
Rsamtools::pileup counts overlapping paired-end reads twice
Is there a way to disable this behavior?
can you clarify the problem and indicate the code you are using?
The problem is when paired end reads are overlapping, such as the example below:
Read1 (first pair): ------------------ACGT
Read1 (second pair): ACGT--------------------
the pileup function will count the overlapping parts twice although they belong to the same fragment.
A related old issue in samtools: https://www.biostars.org/p/87299/
The code I used:
gr <- GenomicRanges::GRanges(chr, IRanges::IRanges(pos, pos))
sbp <- Rsamtools::ScanBamParam(which = gr)
pileupParam <- Rsamtools::PileupParam(max_depth = 100000, min_base_quality = 20, min_mapq = 30, distinguish_strands = F, include_deletions = F, include_insertions = F)
p <- Rsamtools::pileup(bam, scanBamParam = sbp, pileupParam = pileupParam)
I am using Rsamtools v2.8.0 and pileup still double counts the overlapping section of read pairs. Can I request that it mimic the behavior of "samtools mpileup" which by default uses overlap detection and counts it only once.
Hi,
Rsamtools::pileup() was implemented years ago (in 2014) by a Bioconductor core team member (Nathaniel Hayden) who has left the team since then. It was a significant effort i.e. Nathaniel spent a couple of months on it and implemented most of it in C/C++. Unfortunately the core team does not have the resources at the moment to fix the double counting problem but a PR would be welcome.
Best, H.
Thank you