methylKit icon indicating copy to clipboard operation
methylKit copied to clipboard

segfault when uniting large methylrawlistDB object

Open ethan-shealy opened this issue 1 year ago • 2 comments

Hi, I am having an issue where a my analysis is proceeding normally, then a crash happens which terminates the process. I am working with a large methylation dataset from a whole-genome bisulfite experiment, so have something like 10 million CpGs over 40 samples. The first "unite" function I tried worked fine,

methOBJ <- methRead(location = fileLOC.list,
                     sample.id = as.list(file.list),
                     assembly = "AnoSag2.1", 
                     context = "CpG",
                     mincov = 5,
                     treatment = c(0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0))  
Received list of locations.
Reading file.
Reading file.
Reading file.
Reading file.
Reading file.
Reading file.
Reading file.
Reading file.
Reading file.
Reading file.
Reading file.
Reading file.
Reading file.
Reading file.
Reading file.
Reading file.
Reading file.
Reading file.
Reading file.
Reading file.
Reading file.
Reading file.
Reading file.
Reading file.
Reading file.
Reading file.
Reading file.
Reading file.
Reading file.
Reading file.
Reading file.
Reading file.
Reading file.
Reading file.
Reading file.
Reading file.
Reading file.
 									
 methOBJ <- filterByCoverage(methOBJ,lo.count = 5,lo.perc = NULL,
                             hi.count = NULL,hi.perc = 99.9)
 
 methOBJ <- normalizeCoverage(methOBJ)
 
 methFrame.total <- unite(methOBJ, destrand = TRUE, mc.cores = 1, min.per.group = 30L,
                          save.db = TRUE, suffix = "total_30L_", dbdir = getwd())
destranding...
uniting...
creating directory /scratch/es88065/anolis_methylseq/methylkit/methylDB_2022-10-20_f40
compressing the file with bgzip...
making tabix index...
flatfile located at: /scratch/es88065/anolis_methylseq/methylkit/methylDB_2022-10-20_f40/methylBase_total_30L_.txt.bgz
 
mat.methFrame.total <- percMethylation(methFrame.total)
 
 write.table(mat.methFrame.total, file="/scratch/es88065/anolis_methylseq/methylkit/total_mat_30L.data", quote = F, sep = "\t", row.names = F)
 
 methFrame.total.raw <- as(methFrame.total, "methylBase")
Uncompressing file.
Reading file.
 
 write.table(methFrame.total.raw, file="/scratch/es88065/anolis_methylseq/methylkit/total_meth_30L.data", quote = F, sep = "\t", row.names = F)
 
 rm("methFrame.total", "mat.methFrame.total", "methFrame.total.raw")

but since I want to utilize two different filtering steps I used the 'reorganize' function and tried to unite again. Upon doing this, I received this recursive error and the process crashed.

methOBJ.sex.age <- reorganize(methOBJ,
                           sample.ids=as.list(file.list),
                           treatment= c(4,4,4,10,10,4,11,5,11,5,11,5,9,3,3,3,9,9,1,7,1,7,8,2,2,2,2,8,8,8,8,12,12,6,12,6,6))
 
 methFrame.sex.age <- unite(methOBJ.sex.age, destrand = TRUE, mc.cores = 1, min.per.group = 2L,
                            save.db = TRUE, suffix = "sex_ageDiff_2L", dbdir = getwd())
destranding...

 *** caught segfault ***
address 0x2ba042876020, cause 'memory not mapped'
*** recursive gc invocation
*** recursive gc invocation
*** recursive gc invocation
*** recursive gc invocation
*** recursive gc invocation
*** recursive gc invocation
*** recursive gc invocation
*** recursive gc invocation
*** recursive gc invocation
*** recursive gc invocation
*** recursive gc invocation
*** recursive gc invocation
*** recursive gc invocation
*** recursive gc invocation
*** recursive gc invocation
*** recursive gc invocation
*** recursive gc invocation
*** recursive gc invocation
*** recursive gc invocation
*** recursive gc invocation
*** recursive gc invocation
*** recursive gc invocation
*** recursive gc invocation
*** recursive gc invocation
*** recursive gc invocation
*** recursive gc invocation
*** recursive gc invocation
*** recursive gc invocation
*** recursive gc invocation
*** recursive gc invocation
*** recursive gc invocation
*** recursive gc invocation
*** recursive gc invocation
*** recursive gc invocation
*** recursive gc invocation
*** recursive gc invocation
*** recursive gc invocation
*** recursive gc invocation
*** recursive gc invocation
*** recursive gc invocation
*** recursive gc invocation
*** recursive gc invocation
*** recursive gc invocation
*** recursive gc invocation
*** recursive gc invocation
*** recursive gc invocation
*** recursive gc invocation
*** recursive gc invocation
*** recursive gc invocation
*** recursive gc invocation
*** recursive gc invocation
*** recursive gc invocation
*** recursive gc invocation
*** recursive gc invocation
*** recursive gc invocation
*** recursive gc invocation
*** recursive gc invocation
*** recursive gc invocation
*** recursive gc invocation
*** recursive gc invocation
*** recursive gc invocation
*** recursive gc invocation

Traceback:
 1: paste(cpgF$chr, cpgF$start, cpgF$end)
 2: .CpG.dinuc.unify(df)
 3: unite(methOBJ.sex.age, destrand = TRUE, mc.cores = 1, min.per.group = 2L,     save.db = TRUE, suffix = "sex_ageDiff_2L", dbdir = getwd())
 4: unite(methOBJ.sex.age, destrand = TRUE, mc.cores = 1, min.per.group = 2L,     save.db = TRUE, suffix = "sex_ageDiff_2L", dbdir = getwd())
An irrecoverable exception occurred. R is aborting now ...

I only utilized ~50% of my allocated memory, so I do not think this is the issue. I previously ran into a segfault error using methylkit's "as" command to extract methylbase data to a matrix. I had to use methylation.matrix <- methylKit:::fread.gzipped(methFrame.total@dbpath, stringsAsFactors = FALSE, data.table = FALSE,skipDecompress=FALSE) to resolve that issue and avoid the segfault error. Is this issue arising from something similar? Thanks for your help.

ethan-shealy avatar Oct 20 '22 20:10 ethan-shealy