MOFA2 icon indicating copy to clipboard operation
MOFA2 copied to clipboard

Dealing with batch-effects

Open Thapeachydude opened this issue 8 months ago • 2 comments

Hello,

I would like to try MOFA+ for some CITE-Seq data and I was wondering if you could provide some recommendations on how to deal with batch effects. We‘re generally working with data derived from 2-3 different experiments, between which we observe minor batch effects for the RNA component and quite significant batch effects for the surface protein component.

I‘ve found that fastMNN gives good batch corrected PCA-like embeddings, but the reconstructed counts should be used with caution, as they can have negative values. I saw that your FAQ mentions limma, but if I recall benchmarks generally show inferior batch correcting abilities when it comes to single-cell data.

Any insights or recommendations would be much appreciated, thanks!

Thapeachydude avatar Apr 21 '25 17:04 Thapeachydude

One thing to try could be training a MOFA+ model and seeing if the first few factors correspond to batch effect — and if the other factors correlate with the experiment covariate.

It should also be possible to normalize protein counts with CLR/dsb and then use a batch-correction method of choice that could output corrected counts. I can imagine, as the number of surface proteins would usually be comparable to the number of components one typically uses, you could even replace the count matrix with the batch-corrected embeddings, but this will complicate downstream interpretability.

gtca avatar Apr 22 '25 22:04 gtca

Alright, I'll try looking for a "batch factor" or using the corrected counts. Thanks!

Thapeachydude avatar Apr 22 '25 22:04 Thapeachydude