methylKit
methylKit copied to clipboard
Simulator produces unrealistic beta values
I have thoroughly enjoyed using methylkit’s beta value simulation and I am having trouble seeing how this is similar to real data. I am attaching a correlation plot between beta values for methylkit's simulated values vs. real data. I simply want to generate a matrix X of samples by CpG values. Can you help me understand where I am going wrong?
# Libraries
library("methylKit")
library("MASS”)
# Set parameters
set.seed(6)
M = 100
J = 600
percent = 10
# Calculate beta-values from methylkit function
calcBeta <- function(df, nC, nT, patient, df_betas) {
if (patient==1){
#df_betas = setNames(data.frame(matrix(ncol = nrow(df, nrow = 0)), c("name", "age", "gender"))
df_betas = data.frame(patient_1=df[[nC]]/(df[[nC]]+df[[nT]]))
#causal_cpgs.
} else{
df_betas[[paste("patient", patient, sep='_')]] <- df[[nC]]/(df[[nC]]+df[[nT]])
}
df_betas
}
# Simulating methylated sites
simMethyl<-dataSim(replicates=M,sites=J,
treatment=c(rep(1,M)),
sample.ids=c(paste0("sim",1:M))
)
write.csv(simMethyl, paste(OUTDIR,"sim_methylkit.csv", sep="/"), quote=FALSE, row.names=FALSE)
# Calculating beta-values
df_simMethyl = data.frame(simMethyl)
df_betas = data.frame() # TODO: use data.matrix from beginning?
causal_cpgs=list()
# Iterate through replicates (i.e. patients)
for (i in 1:M){
#print(paste0("Patient: ", i))
df <- dplyr::select(df_simMethyl, ends_with(paste0("s",toString(i))))
df_betas<- calcBeta(df, paste("numCs", toString(i), sep=""), paste("numTs", toString(i), sep=""), i, df_betas)
}
# Format CpG beta value matrix
# Transpose to have patients as rows
df_betas <- t(df_betas)
# Convert to matrix for plotting
matr_betas = data.matrix(df_betas)
![Screen Shot 2021-06-27 at 1 36 04 PM](https://user-images.githubusercontent.com/16548432/123566045-b828b180-d78c-11eb-9917-796e5e9a075b.png)
I think there is a misunderstanding about what simulation means or how to use it. Please see https://github.com/BIMSBbioinfo/Strategies_for_analyzing_BS-seq and the associated paper to understand how it is used.
On Sun, Jun 27, 2021 at 11:02 PM Ashley Mae Conard @.***> wrote:
I have thoroughly enjoyed using methylkit’s beta value simulation and I am having trouble seeing how this is similar to real data. I am attaching a correlation plot between beta values for methylkit's simulated values vs. real data. I simply want to generate a matrix X of patients by CpG values. Can you help me understand where I am going wrong?
Libraries
library("methylKit")
library("MASS”)
Set parameters
set.seed(6)
M = 100 # number patients
J = 600 # number CpGs
percent = 10
K = J/percent # number causal CpGs
Calculate beta-values from methylkit function
calcBeta <- function(df, nC, nT, patient, df_betas) {
if (patient==1){ #df_betas = setNames(data.frame(matrix(ncol = nrow(df, nrow = 0)), c("name", "age", "gender")) df_betas = data.frame(patient_1=df[[nC]]/(df[[nC]]+df[[nT]])) #causal_cpgs. } else{ df_betas[[paste("patient", patient, sep='_')]] <- df[[nC]]/(df[[nC]]+df[[nT]]) } df_betas
}
Simulating methylated sites
simMethyl<-dataSim(replicates=M,sites=J,
treatment=c(rep(1,M)), sample.ids=c(paste0("sim",1:M)) )
write.csv(simMethyl, paste(OUTDIR,"sim_methylkit.csv", sep="/"), quote=FALSE, row.names=FALSE)
Calculating beta-values
df_simMethyl = data.frame(simMethyl)
df_betas = data.frame() # TODO: use data.matrix from beginning?
causal_cpgs=list()
Iterate through replicates (i.e. patients)
for (i in 1:M){
#print(paste0("Patient: ", i)) df <- dplyr::select(df_simMethyl, ends_with(paste0("s",toString(i)))) df_betas<- calcBeta(df, paste("numCs", toString(i), sep=""), paste("numTs", toString(i), sep=""), i, df_betas)
}
Format CpG beta value matrix
Transpose to have patients as rows
df_betas <- t(df_betas)
Convert to matrix for plotting
matr_betas = data.matrix(df_betas)
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/al2na/methylKit/issues/230, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAE32EJOSDKD5PF26F6YBJTTU6GVPANCNFSM47MYFONQ .