methylKit
methylKit copied to clipboard
Error reading in file
Hello,
I have written an R script that uses methylKit to ultimately get DMRs. In this script, I read in multiple .bed files into a single methRead() call. The script completes successfully on at least 3 different sets of data I have input.
I am encountering an odd issue however with one dataset where the following error occurs:
Received list of locations.
Reading file.
Reading file.
Erreur in `.rowNamesDF<-`(x, value = value) :
length of 'row.names' incorrect
Calls : methRead ... row.names<- -> row.names<-.data.frame -> .rowNamesDF<-
Execution stopped
As you can see, this happens when reading in file 2. However, these files are all generated in an identical fashion by a master script. Here is what their head looks like:
==> methylation/barcode01/1821_sub.bed <==
contig_6061 3 4 m 1 + 3 4 255,0,0 1 0.00 0 1 0 0 0 0 2
contig_6061 13 14 m 2 + 13 14 255,0,0 2 0.00 0 2 0 0 0 1 0
contig_6061 44 45 m 3 + 44 45 255,0,0 3 0.00 0 3 0 0 0 0 0
contig_6061 45 46 m 1 - 45 46 255,0,0 1 0.00 0 1 0 0 0 0 0
contig_6061 52 53 m 1 - 52 53 255,0,0 1 0.00 0 1 0 0 0 0 0
contig_6061 73 74 m 1 - 73 74 255,0,0 1 0.00 0 1 0 0 0 0 0
contig_6061 139 140 m 1 + 139 140 255,0,0 1 0.00 0 1 0 0 0 0 3
contig_6061 164 165 m 1 + 164 165 255,0,0 1 0.00 0 1 0 0 0 0 3
contig_6061 189 190 m 1 + 189 190 255,0,0 1 0.00 0 1 0 0 0 0 4
contig_6061 190 191 m 1 - 190 191 255,0,0 1 0.00 0 1 0 0 0 0 0
==> methylation/barcode02/1821_sub.bed <==
contig_6061 3066 3067 m 1 - 3066 3067 255,0,0 1 0.00 0 1 0 0 0 0 0
contig_6061 3149 3150 m 1 - 3149 3150 255,0,0 1 0.00 0 1 0 0 0 0 0
contig_6061 3297 3298 m 1 - 3297 3298 255,0,0 1 0.00 0 1 0 0 0 0 0
contig_6061 3327 3328 m 1 - 3327 3328 255,0,0 1 0.00 0 1 0 0 0 0 0
contig_6061 3350 3351 m 1 - 3350 3351 255,0,0 1 0.00 0 1 0 0 0 0 0
contig_6061 3383 3384 m 1 - 3383 3384 255,0,0 1 0.00 0 1 0 0 0 0 0
contig_6061 3416 3417 m 1 - 3416 3417 255,0,0 1 0.00 0 1 0 0 0 0 0
contig_6061 3455 3456 m 1 - 3455 3456 255,0,0 1 0.00 0 1 0 0 0 0 0
contig_6061 3495 3496 m 1 - 3495 3496 255,0,0 1 0.00 0 1 0 0 0 0 0
contig_6061 3517 3518 m 1 - 3517 3518 255,0,0 1 0.00 0 1 0 0 0 0 0
==> methylation/barcode03/1821_sub.bed <==
contig_6061 3 4 m 1 + 3 4 255,0,0 1 0.00 0 1 0 0 0 0 0
contig_6061 4 5 m 1 - 4 5 255,0,0 1 0.00 0 1 0 0 0 2 0
contig_6061 14 15 m 1 - 14 15 255,0,0 1 0.00 0 1 0 0 0 0 2
contig_6061 33 34 m 1 + 33 34 255,0,0 1 0.00 0 1 0 0 0 1 0
contig_6061 44 45 m 2 + 44 45 255,0,0 2 0.00 0 2 0 0 0 0 0
contig_6061 45 46 m 3 - 45 46 255,0,0 3 0.00 0 3 0 0 0 0 0
contig_6061 79 80 m 1 - 79 80 255,0,0 1 0.00 0 1 0 0 0 2 0
contig_6061 86 87 m 1 - 86 87 255,0,0 1 0.00 0 1 0 0 0 2 0
contig_6061 119 120 m 1 - 119 120 255,0,0 1 0.00 0 1 0 1 0 1 0
contig_6061 127 128 m 1 - 127 128 255,0,0 1 0.00 0 1 0 0 0 0 2
Relevant script:
pipeline = list(fraction = FALSE, chr.col = 1, start.col = 2, end.col = 3, coverage.col = 10, strand.col = 6, freqC.col = 11)
# Get list of bedMethyl files to analyze
files.list = list()
samples = list()
treatment = vector()
t = 0
# For each barcode
for (i in 1:7) {
# Skip the control barcode
if (i == 4) next
# Add that barcode to the sample list
sample = paste0("barcode0", i)
samples <- c(samples, sample)
# Get file path to the bedmethyl file
files.list <- c(files.list, paste0("/researchdrive/gkanaan/seaice_methylation/methylation/", sample, "/", assembly, ".bed"))
# Add the correct treatment (top, middle, bottom)
treatment = c(treatment, t%%3)
t = t + 1
}
meth_obj = methRead(files.list, sample.id = samples, assembly=assembly, treatment=treatment, context="CHH", pipeline=pipeline, header=FALSE, mincov=5)
I appreciate any tips.
Ok it seems that this is due to the fact that 0 rows are left after coverage filtering. MethylKit unfortunately does not handle this very gracefully. @alexg9010 is there a way to skip files with 0 rows after coverage filtering instead of crashing?
Browse[5]> n
debug: data[, coverage.col] = round(data[, coverage.col])
Browse[5]> n
debug: data = data[data[, coverage.col] >= mincov, ]
Browse[5]> n
debug: strand = rep("+", nrow(data))
Browse[5]> print(data)
[1] V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11 V12 V13 V14 V15 V16 V17 V18
<0 lignes> (ou 'row.names' de longueur nulle)
Hi @Ge0rges ,
You are right, this edge case is not handled very well. For now, you could use tryCatch to filter files with zero rows. using this code snippet:
meth_list <- lapply(seq_along(file.list),
function(i) {
tryCatch(
expr = {
methRead(
location = file.list[[i]],
sample.id = as.list(samples)[[i]],
assembly=assembly,
# treatment=treatment,
context="CHH",
pipeline=pipeline,
header=FALSE,
mincov=5)
)
},
error = function(e) {
message(paste0("file ", file.list[[i]], " failed to load."))
return(NULL)
}
)
})
failed_samples <- sapply(meth_list, is.null)
meth_obj <- methylRawList(meth_list[!failed_samples], treatment = treatment[!failed_samples])
Best Alex
Thanks for that solution.