pymer4
pymer4 copied to clipboard
Comparing self-subsetting models
I was wondering if pymer4
has a way to compare two models varying in the presence/absence of a given variable in the formula. In R, this can be easily done with the command anova()
. E.g.:
model1 <- lmer(DV + IV1 + (IV1 | Group), data=data)
model2 <- lmer(DV + IV1*IV2 + (IV1 + IV2 | Group), data=data)
anova(model1, model2)
The command Lmer.anova()
in pymer4
does not seem to do the same, though. Is there a way to accomplish this?
P.S.: I am aware this is not a "bug" properly said, but opening a bug issue on Githus seems to be the only way to get in touch with the developers to ask this. Thanks!
Unfortunately not at this time. It's on the short term todo list, so some time this year given my work commitments: https://trello.com/c/XOxtVjX0
I had started this functionality with the hope to compute it in pure Python rather than relying on R calls to keep things a bit speedy as can be seen in this commented code on the master branch: https://github.com/ejolly/pymer4/blob/master/pymer4/stats.py#L552
And this utility function: https://github.com/ejolly/pymer4/blob/master/pymer4/utils.py#L460
If memory serves, I wasn't quite getting matching results with R and I'm sure some of the stats are probably being computed incorrectly and may end up requiring an R call.
Always happy to receive code contributions as PRs!
Thank you for replying. I'd be happy to help, though I fear that my coding skills are not as advanced as we both would want to...
I just had a look at the commented out code for a loglik test, and it is easy to fix it to give the same results as in R (the key is to make sure that we use REML and ML appropriately). In principle, happy to put together a PR.
I made a PR #109 .The proposed implementation matches the lme4 anova.merMod
in behaviour.
It's a bit tricky to write tests. The ideal scenario would be to run anova.merMod
with rpy2
and compare the results; however, that's not possible as anova.merMod
raises an error when comparing models because it thinks that the data are different. This is because it expects data to be provided to lmer
by name, whilst via rpy2
we provide the full dataframe. There is a linked bug report in https://github.com/lme4/lme4/issues/622
I could write a couple of simple tests where we compare to some static values obtained in R, but that means we will not be checking for any changes in the behaviour of R. That's probably the best option, though. @ejolly any thoughts?
Thanks a ton @dramanica! Sorry I haven't had a chance to review as I'm currently traveling, but should be available in about a week or so.
If I recall correctly there was a similar kind of issue that lead me to just compare to static values in the lmm
tests; I did that by separately fitting models in R to the included sample_data.csv
file. So one thought is to do something similar with the included sampled_data.csv
file. See for example here.
No worries, give me a couple of days and I'll add some tests to the PR.
@ejolly Actually, I had a bit of spare time, so tests are now fully implemented. Whilst I was at it, I also added a wrapper to confint
for the Lmer class, so that it is now possible to bootstrap and profile the variances of the random components. It's an additional feature that I will need to use pymer4 in a course I plan to teach next year.