deepTools
deepTools copied to clipboard
BamCoverage: last bin
Hi,
I have deeptools 3.3.0 under Python 3.7.0 and I have a question on how bamCoverage handles the last bin.
I want my coverage from bam file to be binned in 5 kb bins, and using bamCoverage this gives me (I only put the end of my file):
chr1 2410000 2415000 10396
chr1 2415000 2420000 17899
chr1 2420000 2425000 15441
chr1 2425000 2432000 9282
My last bin is actually bigger than 5 kb. Because I need to match the binned data with other data, I need actually all bins to be 5 kb, and the last bin will therefore be smaller than 5 kb, which is ok. This is my expected output:
chr1 2410000 2415000 10396
chr1 2415000 2420000 17899
chr1 2420000 2425000 15441
chr1 2425000 2430000 ...
chr1 2430000 2432000 ...
Is there a way to force this behaviour?
The last bin processed should never be larger than the bin size. However, the reported bins will be merged together if they have the same value. I suspect that that's the cause of what you're seeing. If you absolutely require fixed-width bins (i.e., that neighoring bins not be merged if they have the same value), then you can uncomment lines 250-258 in writeBedGraph.py.
Thank you! However, I would in fact require that my entire chromosome is covered. This is the behaviour with the uncommented lines:
chr1 2415000 2420000 83309
chr1 2420000 2425000 190228
chr1 2425000 2430000 5575
And this is what I would require:
chr1 2415000 2420000 83309
chr1 2420000 2425000 190228
chr1 2425000 2430000 5575
chr1 2425000 2432066 somevalue
The end of the last bin is here equal to the chromosome size, even if it is less than 5kb.
It should be covered, can you provide a BAM file and the command that's producing this behavior?
Thanks for having a look!
I'm attaching a bam-file, I gzipped it so that github let's me upload it. I mapped my reads (a subset) onto a 29,966 bp long sequence. If I bin my test.bam using 5kb bins, I'm expecting 5 bins with 5000 bp and 1 bin with 4966 bp (25001 to 29966)
I checked the end with the following commands:
samtools depth test.bam | tail -n 50
... and the end of my chromosome is covered.
I was running bamCoverage
bamCoverage -p 1 -b test.bam --binSize 5000 -o test.5kb.bw
and checked the bins using pyBigWig:
import pyBigWig
threshold = 10 # Change me
bw = pyBigWig.open("test.5kb.bw") # Change me
for chrom, len in bw.chroms().items():
bw.intervals(chrom)
which gives me:
((0, 5000, 7791.0), (5000, 10000, 9769.0), (10000, 15000, 5968.0), (15000, 20000, 24826.0), (20000, 25000, 11531.0))