rtracklayer icon indicating copy to clipboard operation
rtracklayer copied to clipboard

export.bw consumes unusual amount of memory

Open borauyar opened this issue 5 years ago • 9 comments

Hi,

I have a memory consumption problem with rtracklayer::export.bw function.

I have a RleList object of size ~320 MB, which I obtained by running GenomicAlignments::coverage function on a BAM file to get genome wide coverage scores of an RNA-seq experiment (the target is human genome).

Then, I would like to export this RleList object to a bigwig file, however, it takes up too much memory. The peak memory for exporting this 320MB object is ~54GB. When I export the same object with export.bedgraph function, the peak memory is around ~2.5 GB, which makes sense. export.bedgraph also has an append mode, which decreases peak memory to ~1 GB if print the coverage data in chunks.

I guess it is normal to have an overhead when converting wig to bigwig. Here, UCSC wigToBigWig utility is said to consume 50% more memory than uncompressed wiggle/bedgraph file. But, 54 GB memory consumption is way over the top according to this.

Does this sound like a bug or do you think this could be an architecture related issue? If there is a work-around to decrease memory usage, it would be great to hear.

EDIT: I used the latest rtracklayer_1.44.0, and got the same results with older versions, too. Best,

Bora

borauyar avatar Jul 08 '19 10:07 borauyar

Please try passing fixedSummaries = TRUE.

lawremi avatar Jul 08 '19 16:07 lawremi

@lawremi thank you very much. This decreased the memory consumption to ~38 GB, but the size of the bigwig file increased from 125MB to 2.8GB. Is that memory and disk space consumption normal for an RleList object of 320MB?

borauyar avatar Jul 08 '19 17:07 borauyar

Doesn't seem normal. You seem to have found a pathological case somehow. What happens when you export to wig and use rtracklayer::wigToBigWig()? rtracklayer is mostly using the Kent library under the hood, so there's not a lot of knobs to tweak.

lawremi avatar Jul 10 '19 02:07 lawremi

I tried exporting to bedgraph and converting to bigwig using rtracklayer::wigToBigWig function and it consumed the same memory. Then I tried using the wigToBigWig tool from UCSC Kent utils, that also consumed the same memory.

Can something be wrong with the RleList object? I just created it using GenomicAlignments::coverage function on a GAlignments object, which was imported from a BAM file using the GenomicAlignments::readGAlignments function. I didn't to anything special.

borauyar avatar Jul 11 '19 10:07 borauyar

When you say that wigToBigWig() consumes the "same memory" do you mean that it consumes the same amount of memory as export.bw()? If so, then there's probably little we can do but accept that bigwig is a somewhat expensive format to write and store. It's great for accessing and getting summaries from though.

lawremi avatar Jul 11 '19 11:07 lawremi

Yes, exactly. Both rtracklayer::wigToBigWig and the UCSC's wigToBigWig consume as much as rtracklayer::export.bw. So, I guess, we have to accept the fact that it is a memory intensive job :)

borauyar avatar Jul 11 '19 11:07 borauyar

I am also having an issue related to this. When I use export.bw, it consumes ~25Gb of ram and then fails to release it from memory.

library(GenomicAlignments)
library(rtracklayer)
gr <- readGAlignmentPairs('sample1.bam')
gr.cov <- coverage(gr)
export.bw(gr.cov, 'sample1.bigwig')

Should I be opening a connection to a file manually and closing it after?

For reference my incoming BAM file is 1.6Gb and the output bigwig is 228Mb

edit: I've tried running gc() to release it to no avail

doliv071 avatar Jun 30 '22 14:06 doliv071

BigWig export is implemented by the Kent library, so there's no support for connections, and the memory does not belong to R for it to be garbage collected. The library does, however, clean up after itself. This is the first I've heard of it leaking memory. Are you sure it's not just cached by the process?

lawremi avatar Jun 30 '22 21:06 lawremi

Are you sure it's not just cached by the process?

How can I check this?

doliv071 avatar Jul 01 '22 14:07 doliv071