methylKit icon indicating copy to clipboard operation
methylKit copied to clipboard

Accounting Covariates in my analysis

Open priyatamapandey opened this issue 2 years ago • 1 comments

Hi, I am using methylkit for my RRBS data. I do have some covariates too which I want to use and check their effect to in our treatments too. I have generated few cluster figure and did some experiment based on the given how to use methylkit manual. Initially, when I plot the cluster using test(diabetic )and control (non-diabetic )only,

Screen Shot 2021-12-10 at 1 09 16 PM

Later, I added the batch effect to check the association of other covariates obesity and lean. Although I do not consider this as a batch effect but I want to see the cluster seperation in test and control group before proceeding further.

Screen Shot 2021-12-10 at 1 13 17 PM

I am not able to understand the association that how to interpret it. PC1 mostly showing the same value for all the samples and 52% of PC1 showing the association with the obesity status but there are other PCs too which is higher than PC1 not in terms of variances though. I did remove the PC1, newMeth=removeComp(meth,comp=1) and plotted the newMeth object again. It is better clustering now than before. Screen Shot 2021-12-10 at 1 21 49 PM

Further , I tried dataSim function using age covariate as the obesity covariate did not work in the function (please let me know how to resolve this issue it has 2 category lean and obesity ) and the cluster separated very well.

`covariates=data.frame(age = mouse_covariate$Age)

sim.methylBase<- dataSim(replicates=13,sites=1000, treatment=c(0,1,1,0,0,0,1,0,0,1,0,0,1), covariates=covariates, sample.id= smpl_id$smpID
)

clusterSamples(sim.methylBase, dist="correlation", method="ward", plot=TRUE)`

Screen Shot 2021-12-10 at 1 24 05 PM

So just to understand this, it means that age is important covariate it has effect on my treatment and I should account for that when performing differential methylation? Also while performing caculateDiffMeth what object I should feed in this function my meth object not this sim.methylBase as it only showing the effect of the covariate?

Thank you for help and suggestion. It would be greatly appreciated! Priya

priyatamapandey avatar Dec 10 '21 21:12 priyatamapandey

Hi @priyatamapandey,

Sorry for the late reply, I'll try to help you with your questions.

Concerning the batch correction, the association values that are part of the assocComp() output shows the association p-values between sample annotations and principal components (https://rdrr.io/bioc/methylKit/man/assocComp-methods.html). Thus, your third (and maybe sixth) principle component seems to show some association with the obesity status. I would also suggest viewing the initial and corrected clustering in a PCA plot using the PCASamples() function. Maybe this will add some additional insight.

About the data.sim() problem with the obesity covariate, you might need to wrap categorial variables with a as.integer(as.factor()) call to translate the labels into integers, because dataSim() requires numerical input for the covariates. Also, it is currently not possible to have more than one covariate for the simulation.

Concerning the simulation with the mouse age, based on the clustering it does not seem as if there would be a strong effect of age on the treatment. However, you may include the age covariate in the assocComp() call, since this function can handle multiple covariates.

Finally, unless you are strongly interested in the effect of the covariate on the simulation, you should definitely pass the initial or the batch corrected methylBase object to calculateDiffMeth().

I hope this helps you to proceed with your analysis. Best Alex

alexg9010 avatar Feb 18 '22 17:02 alexg9010