rtracklayer
rtracklayer copied to clipboard
export.bw consumes unusual amount of memory
Hi,
I have a memory consumption problem with rtracklayer::export.bw
function.
I have a RleList object of size ~320 MB, which I obtained by running GenomicAlignments::coverage
function on a BAM file to get genome wide coverage scores of an RNA-seq experiment (the target is human genome).
Then, I would like to export this RleList object to a bigwig file, however, it takes up too much memory. The peak memory for exporting this 320MB object is ~54GB. When I export the same object with export.bedgraph
function, the peak memory is around ~2.5 GB, which makes sense. export.bedgraph
also has an append mode, which decreases peak memory to ~1 GB if print the coverage data in chunks.
I guess it is normal to have an overhead when converting wig to bigwig. Here, UCSC wigToBigWig utility is said to consume 50% more memory than uncompressed wiggle/bedgraph file. But, 54 GB memory consumption is way over the top according to this.
Does this sound like a bug or do you think this could be an architecture related issue? If there is a work-around to decrease memory usage, it would be great to hear.
EDIT: I used the latest rtracklayer_1.44.0, and got the same results with older versions, too. Best,
Bora
Please try passing fixedSummaries = TRUE
.
@lawremi thank you very much. This decreased the memory consumption to ~38 GB, but the size of the bigwig file increased from 125MB to 2.8GB. Is that memory and disk space consumption normal for an RleList object of 320MB?
Doesn't seem normal. You seem to have found a pathological case somehow. What happens when you export to wig and use rtracklayer::wigToBigWig()
? rtracklayer is mostly using the Kent library under the hood, so there's not a lot of knobs to tweak.
I tried exporting to bedgraph and converting to bigwig using rtracklayer::wigToBigWig
function and it consumed the same memory. Then I tried using the wigToBigWig tool from UCSC Kent utils, that also consumed the same memory.
Can something be wrong with the RleList object? I just created it using GenomicAlignments::coverage
function on a GAlignments object, which was imported from a BAM file using the GenomicAlignments::readGAlignments
function. I didn't to anything special.
When you say that wigToBigWig()
consumes the "same memory" do you mean that it consumes the same amount of memory as export.bw()
? If so, then there's probably little we can do but accept that bigwig is a somewhat expensive format to write and store. It's great for accessing and getting summaries from though.
Yes, exactly. Both rtracklayer::wigToBigWig
and the UCSC's wigToBigWig
consume as much as rtracklayer::export.bw
. So, I guess, we have to accept the fact that it is a memory intensive job :)
I am also having an issue related to this. When I use export.bw
, it consumes ~25Gb of ram and then fails to release it from memory.
library(GenomicAlignments)
library(rtracklayer)
gr <- readGAlignmentPairs('sample1.bam')
gr.cov <- coverage(gr)
export.bw(gr.cov, 'sample1.bigwig')
Should I be opening a connection to a file manually and closing it after?
For reference my incoming BAM file is 1.6Gb and the output bigwig is 228Mb
edit: I've tried running gc()
to release it to no avail
BigWig export is implemented by the Kent library, so there's no support for connections, and the memory does not belong to R for it to be garbage collected. The library does, however, clean up after itself. This is the first I've heard of it leaking memory. Are you sure it's not just cached by the process?
Are you sure it's not just cached by the process?
How can I check this?