ComBat-seq
ComBat-seq copied to clipboard
ComBat-seq + DESeq2 / WGCNA Or DESeq2 (batch as covariate) / ComBat + WGCNA
Hi,
I am using ComBat-seq to remove batch effects from my dataset, and then running DESeq2 on the same. I was wondering if I could use the same data, after rlog transformation, for WGCNA?
Which pipeline would be better (to get both differentially expressed genes and WGCNA results) -
- ComBat-seq -> DESeq2 -> rlog -> WGCNA
- DESeq2 (batch as covariate) -> rlog -> ComBat -> WGCNA
Thank you!
Hi, I'm currently doing something similar to you. To answer your question I would say that batch correction should be the first step as it requires raw data as input. So my suggestion is to follow the workflow 1.
I do have a question as well. By running "ComBat_seq(Dataset,batch=my_batch)", is the output going to be the dataset corrected by batch effects?
@Bithorax thanks for your suggestion for the question! Yes, the output will be the dataset corrected by batch effects.
Thank you for your answer, @Bithorax and @zhangyuqing !
One last question if you can help. I'm not quite sure when I should specify the "group" and hence "full mod=TRUE" parameters. do you have an explanation?
@Bithorax Both "group" and "covar_mod" refer to any covariates whose signal you would like to keep in your data. So, in differential expression analysis for example, group would be the condition group you are comparing. In addition, if you would like to remain information from any other variables, you can specify them in covar_mod. On the contrary, "batch" is the variable whose signal you would like to remove from the data.
Thanks for the explanation. Just a doubt. If specifying "batch" is only removing the batch effect from the dataset, then automatically the signal of my variables of interest are kept. Am I wrong?
@Bithorax Unfortunately in real data, we can never be 100% sure that only batch effect is removed, because we do not truly know how batch has affected the data, we can only guess. And we are guessing these effects using linear models. In linear models, whether or not you include other signals in the model affects your guess on the batch effect.
If you are familiar with linear regression, perhaps you can think of it simply as the difference between estimating parameters of the 2 models below: data ~ batch data ~ batch + other signals The parameters for batch are what we are guessing, which has different interpretations and values in the two models.
Yes, I see your point and I agree. It would be curious to compare the two models to see the difference in the signal. But I guess this also depends on the input dataset.
Thanks for the feedback!
@zhangyuqing I'm a bit confused about this since it looks like option 1 is recommended? My understanding is that the linear model should be run with uncorrected data with batch as a covariate. The statiscal results can then be merged back with the combat corrected and normalized counts. Can someone please confirm. May be I'm mis undstanding the question somewhere?