ComBat-seq icon indicating copy to clipboard operation
ComBat-seq copied to clipboard

Common not tagwise dispersion used when one level of covariate - Code error?

Open reidbm opened this issue 2 years ago • 0 comments

When a batch has only one level of a covariate (i.e. group) variable, the common dispersion is used for ALL GENES. Why? Shouldn't the code default to the tagwise dispersion (without the group adjustment)? Genes are not adequately debatched by using the common dispersion across all genes...at least in my data. I noticed my covariate adjustment run of combat still had batch effects showing at PC3/PC4 (but not when the group variable was null). I ran dispersion estimates and glms manually and this seems to be the issue in the combat-seq wrapper function....

## Estimate gene-wise dispersion within each batch genewise_disp_lst <- lapply(1:n_batch, function(j){ if((n_batches[j] <= ncol(design)-ncol(batchmod)+1) | qr(mod[batches_ind[[j]], ])$rank < ncol(mod)){ # not enough residual degrees of freedom - use the common dispersion return(rep(disp_common[j], nrow(counts))) }else{ return(estimateGLMTagwiseDisp(counts[, batches_ind[[j]]], design=mod[batches_ind[[j]], ], dispersion=disp_common[j], prior.df=0)) } })

WHY NOT THIS.... genewise_disp_lst <- lapply(1:n_batch, function(j){ if((n_batches[j] <= ncol(design)-ncol(batchmod)+1) | qr(mod[batches_ind[[j]], ])$rank < ncol(mod)){ # not enough residual degrees of freedom - use the common dispersion return(estimateGLMTagwiseDisp(counts[, batches_ind[[j]]], design=NULL, dispersion=disp_common[j], prior.df=0)) }else{ return(estimateGLMTagwiseDisp(counts[, batches_ind[[j]]], design=mod[batches_ind[[j]], ], dispersion=disp_common[j], prior.df=0)) } })

This modification runs the batch effect adjustment per usual, with covariate adjustment when present in a batch. For example, my data is designed as below:

Batch 1 - Tumor --> tagwise
Batch 2 - Normal --> tagwise Batch 3 - Tumor/Normal --> tagwise-adjusted Batch 4 - Tumor --> tagwise

This corrects the residual batch effect seen at PC3/PC4. I haven't checked but my guess is that the genes with batch effects remaining are those that are not adequately modeled using the common dispersion in batches 1, 2, and 4. Please let me know if I am missing a reason why we would want to use common dispersion across all genes in this scenario. However, based on the regression described in the paper, I think the current code does not reflect the regression described.

reidbm avatar Jun 01 '22 22:06 reidbm