diffcyt icon indicating copy to clipboard operation
diffcyt copied to clipboard

low p values when using testDA_GLMM().. lme4 too sensitive to weights?

Open florian-bach opened this issue 4 years ago • 0 comments

Hi Lukas,

first off, thanks for your work on diffcyt, it's been enormously helpful for my research.

I've been mostly using the edgeR option for my differential abundance analysis, but was working on an experimental design that would benefit from random effects, so I gave testDA_GLMM() a try. When comparing 50 clusters from 9 individuals between two timepoints each, the results looked a bit confusing, as all the p values were extremely close to 0. The issue looks extremely similar to the one described in [https://github.com/lmweber/diffcyt/issues/17]. I had a look in the source code and I think I found an issue with lme4: when supplying the weights to glmer, my cell counts are apparently too high (between 10,000 and 90,000 per sample) so that very small differences in abundance are found significant.

I decided to divide all counts in n_cells_smp by 10 (or even 5 or 2) just to see what would happen, and the p values shot up a lot, so glmer(family = "binomial") seems extremely sensitive to large weights. I tried your testDA_GLMM() code and swapped out glmer with MASS::glmmPQL() while keeping the weights unchanged. While this approach wasn't as sensitive as testDA_edgeR(), it didn't look to me as though it supplied many false positives, which testDA_GLMM() certainly did. That's why I thought I'd let you know, because to me this all sounds like an issue with how lme4 deals with weights.

Let me know if it would be helpful and I can put together a reprex using my now public data.

Cheers

Florian

florian-bach avatar Mar 09 '21 13:03 florian-bach