deepTools icon indicating copy to clipboard operation
deepTools copied to clipboard

BamCoverage: last bin

Open sarah872 opened this issue 4 years ago • 4 comments

Hi, I have deeptools 3.3.0 under Python 3.7.0 and I have a question on how bamCoverage handles the last bin. I want my coverage from bam file to be binned in 5 kb bins, and using bamCoverage this gives me (I only put the end of my file):

chr1	2410000	2415000	10396
chr1	2415000	2420000	17899
chr1	2420000	2425000	15441
chr1	2425000	2432000	9282

My last bin is actually bigger than 5 kb. Because I need to match the binned data with other data, I need actually all bins to be 5 kb, and the last bin will therefore be smaller than 5 kb, which is ok. This is my expected output:

chr1	2410000	2415000	10396
chr1	2415000	2420000	17899
chr1	2420000	2425000	15441
chr1	2425000	2430000	...
chr1	2430000	2432000	...

Is there a way to force this behaviour?

sarah872 avatar Oct 21 '21 18:10 sarah872

The last bin processed should never be larger than the bin size. However, the reported bins will be merged together if they have the same value. I suspect that that's the cause of what you're seeing. If you absolutely require fixed-width bins (i.e., that neighoring bins not be merged if they have the same value), then you can uncomment lines 250-258 in writeBedGraph.py.

dpryan79 avatar Oct 25 '21 12:10 dpryan79

Thank you! However, I would in fact require that my entire chromosome is covered. This is the behaviour with the uncommented lines:

chr1	2415000	2420000	83309
chr1	2420000	2425000	190228
chr1	2425000	2430000	5575

And this is what I would require:

chr1	2415000	2420000	83309
chr1	2420000	2425000	190228
chr1	2425000	2430000	5575
chr1	2425000	2432066	somevalue

The end of the last bin is here equal to the chromosome size, even if it is less than 5kb.

sarah872 avatar Nov 06 '21 08:11 sarah872

It should be covered, can you provide a BAM file and the command that's producing this behavior?

dpryan79 avatar Nov 06 '21 14:11 dpryan79

Thanks for having a look! I'm attaching a bam-file, I gzipped it so that github let's me upload it. I mapped my reads (a subset) onto a 29,966 bp long sequence. If I bin my test.bam using 5kb bins, I'm expecting 5 bins with 5000 bp and 1 bin with 4966 bp (25001 to 29966)

I checked the end with the following commands:

samtools depth test.bam | tail -n 50

... and the end of my chromosome is covered.

I was running bamCoverage

bamCoverage -p 1 -b test.bam --binSize 5000 -o test.5kb.bw

and checked the bins using pyBigWig:

import pyBigWig
threshold = 10  # Change me
bw = pyBigWig.open("test.5kb.bw")  # Change me
for chrom, len in bw.chroms().items():
    bw.intervals(chrom)

which gives me:

((0, 5000, 7791.0), (5000, 10000, 9769.0), (10000, 15000, 5968.0), (15000, 20000, 24826.0), (20000, 25000, 11531.0))

test.bam.gz

sarah872 avatar Nov 07 '21 13:11 sarah872