DHARMa Interpretation of residuals for beta regression with glmmTMB

Interpretation of residuals for beta regression with glmmTMB

Open florianhartig opened this issue 3 years ago • 3 comments

From a user

I have fitted a glmm (glmmTMB) with Pi (polymorphisme, nucleotide diversity) as a function of community shannon diversity. I added species name (bacteria) as a random effect on slope and intercept because we are expecting a variation among different species. and I added sample id nested in subject id as a random effect on intercept to account for dependence between observations.

My response is a proportion between 0 and 1 (1>=Pi>=0), so I used beta distribution in glmmTMB.

I used DHarma to look if residual normality and homoscedasticity assumptions are fulfilled, I am attaching glmms output and Dharma plots.

Model 1: is my basic model , the normality and homoscedasticity plots have some deviance , so I tried to make changes to the model 1 to improve the fit. but the diagnostic plots looks the same. In the same time, I am aware that some deviance is not of concern and is due to the randomness of the residuals.

For another question, I added a covariate to my model 1 to test the assumption that genome size (estimated by present accessory genes) has an effect on the relationship between pi and shannon diversity, and for this model, the homoscedasticity plot looks even worst.

Feb 01 '21 18:02 florianhartig

OK, your first 4 models essentially look like this (this is M1).

My concern with this figure is that, as you see, there are no data that reach to the bottom (i.e. the simulated data is wider to the bottom than the observed data), and a lot of data that are outside the simulation envelope to the top.

For me, this looks a bit as if the beta distribution doesn't fit very well to your data, and the fact that this is so homogenous with the predicted values suggests to me that the model predictions do not differ so much for the different data points.

For the last model (5) things change a bit - here the problem changes with the predicted value. I'm not sure if I would agree that this is worse - a more positive spin would be to say that at least, we get a better fit for the large values now.

In general, however, I would say these data don't fit particularly well to a beta distribution. This is actually not uncommon im my experience. People choose the beta because it is said that it's suitable for continuous proportions, but the the beta is not as flexible as many people think, and does often not fit well to observed data. Maybe things could be improved by changing the model structure and / or modelling the dispersion of the beta as well.

Alternatively, however, I would also suggest to explore if you can simply fit the data with a linear mixed model + suitable transformation on y.

Feb 01 '21 19:02 florianhartig

Reply from the user:

I tried your suggestions to improve the fit :

By transforming the response (with powerTransform function) and fit a linear mixed model on the new data, both residuals normality and homoscedasticity are improved. but the output is completely different from the generalized mixed model.

I fitted the same generalized mixed model but with a zero-inflation and dispersion terms, the diagnostic plots look very similar to the initial glmm model.

when I compare the AIC between these two models : the generalized linear mixed model is more parsimonious than the linear mixed model (dAIC=92108.5).

Should i report both models to show that the most parsimonious model validate our hypothesis (positive effect of Shannon on PI), but the residuals assumptions are not perfectly fulfilled (so the CI and pvalues may be biased)?

the output from dispformula glmm is different from the lmm output (I sent you the output in a pdf), I am interested in shannon effect (total reads is to control for the sequencing effort which is different between samples).

I have just forgot to mention that we filtered out the data , and now pi is between 0 and 0.1 (may be that’s why beta distribution is not the fitting well).

I switched REML off in the lmm for AIC comparison and dAIC is still big : 92085.9.

Feb 09 '21 15:02 florianhartig

Just for reference: this was the original beta model:

So, just to summarise: if I got everything correctly your results regarding Shannon are that

simple LMM suggest no significant effect
beta GLMM suggests positive effect
beta zi GLMM also suggest significant positive effect (although it was not clear to me if you have predictors in the zi term).

As for the AIC: I don't think you can compare continuous with discrete distributions via AIC. Maybe the approach in https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4493133/ would help but I'm not quite sure which is the best way to compare the models.

Overall, I don't see a clear reason to choose one model vs. the other. Of course, the GLMM is probably "technically" more correct because we have a [0,1] response, and no one will be able to accuse you of doing something wrong if you fit it like that, but the fact that you cant recover the positive effect in a simple LMM would make me a bit suspicious if the positive effect is really robust.

What you could do to explore this a bit further is plotting the actual predictions for all 3 models - with the nonlinear link, and the Shannon predictor also in the variance (possibly also zi terms), it's somewhat unclear to me what an effect size != 0 really means for your GLMMs. What I would suggest is plotting the model predictions on the response scale (using predict () with type = "response"), for all 3 models, against Shannon, to confirm that the effective effect at the response scale is actually different for the tree models. Maybe add the data - do you actually see a positive effect in the data?

Feb 09 '21 16:02 florianhartig

DHARMa DHARMa copied to clipboard

Interpretation of residuals for beta regression with glmmTMB

DHARMa
DHARMa copied to clipboard