ComBat-seq icon indicating copy to clipboard operation
ComBat-seq copied to clipboard

ComBat-seq + DESeq2 / WGCNA Or DESeq2 (batch as covariate) / ComBat + WGCNA

Open sharvarinarendra opened this issue 4 years ago • 9 comments

Hi,

I am using ComBat-seq to remove batch effects from my dataset, and then running DESeq2 on the same. I was wondering if I could use the same data, after rlog transformation, for WGCNA?

Which pipeline would be better (to get both differentially expressed genes and WGCNA results) -

  1. ComBat-seq -> DESeq2 -> rlog -> WGCNA
  2. DESeq2 (batch as covariate) -> rlog -> ComBat -> WGCNA

Thank you!

sharvarinarendra avatar Jan 31 '21 01:01 sharvarinarendra

Hi, I'm currently doing something similar to you. To answer your question I would say that batch correction should be the first step as it requires raw data as input. So my suggestion is to follow the workflow 1.

I do have a question as well. By running "ComBat_seq(Dataset,batch=my_batch)", is the output going to be the dataset corrected by batch effects?

Bithorax avatar Feb 03 '21 11:02 Bithorax

@Bithorax thanks for your suggestion for the question! Yes, the output will be the dataset corrected by batch effects.

zhangyuqing avatar Feb 03 '21 20:02 zhangyuqing

Thank you for your answer, @Bithorax and @zhangyuqing !

sharvarinarendra avatar Feb 05 '21 16:02 sharvarinarendra

One last question if you can help. I'm not quite sure when I should specify the "group" and hence "full mod=TRUE" parameters. do you have an explanation?

Bithorax avatar Feb 05 '21 21:02 Bithorax

@Bithorax Both "group" and "covar_mod" refer to any covariates whose signal you would like to keep in your data. So, in differential expression analysis for example, group would be the condition group you are comparing. In addition, if you would like to remain information from any other variables, you can specify them in covar_mod. On the contrary, "batch" is the variable whose signal you would like to remove from the data.

zhangyuqing avatar Feb 05 '21 21:02 zhangyuqing

Thanks for the explanation. Just a doubt. If specifying "batch" is only removing the batch effect from the dataset, then automatically the signal of my variables of interest are kept. Am I wrong?

Bithorax avatar Feb 05 '21 21:02 Bithorax

@Bithorax Unfortunately in real data, we can never be 100% sure that only batch effect is removed, because we do not truly know how batch has affected the data, we can only guess. And we are guessing these effects using linear models. In linear models, whether or not you include other signals in the model affects your guess on the batch effect.

If you are familiar with linear regression, perhaps you can think of it simply as the difference between estimating parameters of the 2 models below: data ~ batch data ~ batch + other signals The parameters for batch are what we are guessing, which has different interpretations and values in the two models.

zhangyuqing avatar Feb 05 '21 22:02 zhangyuqing

Yes, I see your point and I agree. It would be curious to compare the two models to see the difference in the signal. But I guess this also depends on the input dataset.

Thanks for the feedback!

Bithorax avatar Feb 05 '21 22:02 Bithorax

@zhangyuqing I'm a bit confused about this since it looks like option 1 is recommended? My understanding is that the linear model should be run with uncorrected data with batch as a covariate. The statiscal results can then be merged back with the combat corrected and normalized counts. Can someone please confirm. May be I'm mis undstanding the question somewhere?

ahdee avatar Aug 13 '21 19:08 ahdee