chromVAR icon indicating copy to clipboard operation
chromVAR copied to clipboard

getCounts segfault

Open jonathanperrie opened this issue 5 years ago • 5 comments

I am running getCounts over ~10k files that occupy 4 GB of space. My peakset has 218595 peaks. The segfault happens even when I have access to 2x/4x the memory I normally do.

*** caught segfault ***
address 0x2aa6cdc61760, cause 'memory not mapped'

Traceback:
 1: vapply(results, function(x) x[["counts"]], rep(0, length(peaks)))
 2: get_counts_from_beds(alignment_files, peaks, paired, colData)
 3: getCounts(bedfiles, peaks, paired = FALSE, format = "bed")

Could it be that vapply leaves around too many zombie proccesses as is referred to on this Stack Overflow post? https://stackoverflow.com/questions/43050763/weird-segfault-in-r-when-using-mclapply-in-linux

jonathanperrie avatar Mar 01 '19 23:03 jonathanperrie

Would it be acceptable to call getCounts multiple times and then stitch the results together into a larger summarized experiment?

jonathanperrie avatar Mar 05 '19 18:03 jonathanperrie

For those curious, that is what I ended up doing:

frag_counts1 <- getCounts(bedfiles1, peaks, paired =  FALSE, format="bed")
frag_counts2 <- getCounts(bedfiles2, peaks, paired =  FALSE, format="bed")

counts_mat=cbind(assays(frag_counts1)$counts,assays(frag_counts2)$counts)
depth=data.frame(depth=rbind(colData(frag_counts1),colData(frag_counts2)))

frag_counts<-SummarizedExperiment(assays = list(counts = counts_mat), rowRanges = peaks, colData = depth)

jonathanperrie avatar Mar 15 '19 18:03 jonathanperrie

Thanks for sharing the solution that worked for you!

AliciaSchep avatar Mar 27 '19 03:03 AliciaSchep

I also had the same issue with ~8k files. I tried what Froblinkin suggested. Running getCounts with a quarter of them takes about 30-40Gb of RAM and did not report any issues. Hopefully you this could be optimized~ ZB

mzhibo avatar Apr 03 '19 17:04 mzhibo

I have a same problem but more worse, I have eight sample bam files without a clear @RG header, and their peaks files obtained by executing getPeaks() function having different rows, in other words, the peaks region and number in each sample are different, So how could you merge their counts by following codes: counts_mat=cbind(assays(fragment_counts)$counts,assays(fragment_counts_2)$counts) And my fragment_counts and fragment_counts_2 have different rows. So how could I work it out?

pangxueyu233 avatar Jun 27 '19 04:06 pangxueyu233