pymer4 icon indicating copy to clipboard operation
pymer4 copied to clipboard

Comparing self-subsetting models

Open robpetrosino opened this issue 4 years ago • 7 comments

I was wondering if pymer4 has a way to compare two models varying in the presence/absence of a given variable in the formula. In R, this can be easily done with the command anova(). E.g.:

model1 <- lmer(DV + IV1 + (IV1 | Group), data=data)
model2 <- lmer(DV + IV1*IV2 + (IV1 + IV2 | Group), data=data)

anova(model1, model2)

The command Lmer.anova() in pymer4 does not seem to do the same, though. Is there a way to accomplish this?

P.S.: I am aware this is not a "bug" properly said, but opening a bug issue on Githus seems to be the only way to get in touch with the developers to ask this. Thanks!

robpetrosino avatar Feb 08 '21 08:02 robpetrosino

Unfortunately not at this time. It's on the short term todo list, so some time this year given my work commitments: https://trello.com/c/XOxtVjX0

I had started this functionality with the hope to compute it in pure Python rather than relying on R calls to keep things a bit speedy as can be seen in this commented code on the master branch: https://github.com/ejolly/pymer4/blob/master/pymer4/stats.py#L552

And this utility function: https://github.com/ejolly/pymer4/blob/master/pymer4/utils.py#L460

If memory serves, I wasn't quite getting matching results with R and I'm sure some of the stats are probably being computed incorrectly and may end up requiring an R call.

Always happy to receive code contributions as PRs!

ejolly avatar Feb 08 '21 23:02 ejolly

Thank you for replying. I'd be happy to help, though I fear that my coding skills are not as advanced as we both would want to...

robpetrosino avatar Feb 09 '21 08:02 robpetrosino

I just had a look at the commented out code for a loglik test, and it is easy to fix it to give the same results as in R (the key is to make sure that we use REML and ML appropriately). In principle, happy to put together a PR.

dramanica avatar Jul 13 '22 17:07 dramanica

I made a PR #109 .The proposed implementation matches the lme4 anova.merMod in behaviour. It's a bit tricky to write tests. The ideal scenario would be to run anova.merMod with rpy2 and compare the results; however, that's not possible as anova.merMod raises an error when comparing models because it thinks that the data are different. This is because it expects data to be provided to lmer by name, whilst via rpy2 we provide the full dataframe. There is a linked bug report in https://github.com/lme4/lme4/issues/622 I could write a couple of simple tests where we compare to some static values obtained in R, but that means we will not be checking for any changes in the behaviour of R. That's probably the best option, though. @ejolly any thoughts?

dramanica avatar Jul 14 '22 11:07 dramanica

Thanks a ton @dramanica! Sorry I haven't had a chance to review as I'm currently traveling, but should be available in about a week or so.

If I recall correctly there was a similar kind of issue that lead me to just compare to static values in the lmm tests; I did that by separately fitting models in R to the included sample_data.csv file. So one thought is to do something similar with the included sampled_data.csv file. See for example here.

ejolly avatar Jul 21 '22 19:07 ejolly

No worries, give me a couple of days and I'll add some tests to the PR.

dramanica avatar Jul 22 '22 14:07 dramanica

@ejolly Actually, I had a bit of spare time, so tests are now fully implemented. Whilst I was at it, I also added a wrapper to confint for the Lmer class, so that it is now possible to bootstrap and profile the variances of the random components. It's an additional feature that I will need to use pymer4 in a course I plan to teach next year.

dramanica avatar Jul 22 '22 17:07 dramanica