metafor
metafor copied to clipboard
Accept data.frame in predict()
Classification:
Enhancement Suggestion
Summary
I'm trying to fit an rma.mv()
model with a fairly complex moderator structure (factor variable with several levels and splines for a continuous variable, interaction effects to model the different outcomes). Given this complexity, I'm having a really hard time trying to construct the model matrix for newmods
by hand. It would be much easier and less error-prone if predict.rma()
accepted a data frame for the newmods
argument the same way most R modeling functions do.
Reproducible Example (if applicable)
library(metafor)
#> Loading required package: Matrix
#>
#> Loading the 'metafor' package (version 3.1-3). For an
#> introduction to the package please type: help(metafor)
dat <- escalc(measure="RR", ai=tpos, bi=tneg, ci=cpos, di=cneg, data=dat.bcg)
mod <- rma(yi ~ ablat + year, vi, data=dat)
predict(mod, newmods = dat[1:4,])
#> Error in predict.rma(mod, newmods = dat[1:4, ]): Argument 'newmods' should be a vector or matrix, but is of class 'escalc'.Argument 'newmods' should be a vector or matrix, but is of class 'data.frame'.
cf. predict.lm()
:
predict(lm(mpg ~ disp, data = mtcars), newdata = mtcars[1:5,])
#> Mazda RX4 Mazda RX4 Wag Datsun 710 Hornet 4 Drive
#> 23.00544 23.00544 25.14862 18.96635
#> Hornet Sportabout
#> 14.76241
predict.rma()
is such an unholy mess that I barely dare to touch it :exploding_head:
But this aside, there are actually two things here:
First, dat[1:4,]
contains variables that are not actually part of the moderators, so those should then be (silently?) ignored. So, at the moment, while
predict(mod, newmods = as.matrix(dat[1:4,c(3,8)]))
works,
predict(mod, newmods = as.matrix(dat[1:4,c(1,3,8)]))
also throws an error. One can debate this implementation. It is less convenient, but it can also be argued to provide a certain 'safeguard' (one should only put moderators into newmods
that are actually in the model). I will have to think about how things are implemented in predict.rma()
to see whether this requirement can be dropped though.
The other thing is the class of what you put into newmods
. That indeed must be a matrix (or vector) at the moment. I can't just do a if (is.data.frame(newmods)) newmods <- as.matrix(newmods)
since as.matrix(dat[1:4,])
would just end up being an all character matrix.
As a general workflow that deals with these two issues automatically, you could do however:
mod <- rma(yi, vi, mods = ~ ablat + year, data=dat)
predict(mod, newmods = model.matrix(formula(mod), dat[1:4,])[,-1])
That is also how you can deal with the fact that predict.rma()
does not create the model matrix for you -- the user has to do this directly (that's a design decision that I regret, but it's way too late to change this now). So, for example, one has to do:
mod <- rma(yi, vi, mods = ~ ablat + year + I(ablat^2), data=dat)
predict(mod, newmods = c(44,1948,44^2))
(which gets really tedious when you have interactions, factors, and other 'derived' terms in your model), but one can also just do:
predict(mod, newmods = model.matrix(formula(mod), dat[1,])[,-1])
(the [,-1]
is to get rid of the (Intercept)
term, since that gets added by predict.rma()
and shouldn't be in newmods
-- another regrettable design decision).
Note that for splines, you have to be careful. The knot positions that were used in the model should obviously be the same as those used in newmods
, so you should check carefully that model.matrix()
is doing that. See also this example to illustrate this point: https://www.metafor-project.org/doku.php/tips:non_linear_meta_regression#spline_model
The way predict works for most R modeling functions is that it takes the data frame supplied and then evaluates the model formula or terms with that data frame as the environment. So, for example, I would absolutely expect to be able to provide a data frame with unused variables or a factor variable that is not already dummy coded, etc.
Yeah, that's just not how things work in metafor. When I wrote rma()
(this goes back to code I wrote around 2000), I did not have such a deep understanding of things like model frames, terms, and so on, and hence did not make use of this 'machinery'. As I said, it's a regrettable design choice I made a long time ago and I cannot really fix this now or break tons of code. Using model.matrix(formula(<model>), <some data frame>)[,-1]
is a workaround.
Hey @wviechtb I've pulled together some functions to help construct a model matrix from a data frame in newmods
https://gist.github.com/bwiernik/9602623e092662bcc70e9b37f78c838b
Now that the modeling objects return the data argument (https://github.com/bwiernik/metafor/commit/1c571656e7e482d9db8026523c2b9faf5d423019), I could simplify these a bit, and they could work natively inside predct.rma()
. Would you be open to a PR for that?
Related to this, it would be really helpful if the rma object could also store xlev
and contrasts
slots for factor variables in the model.
Hi @bwiernik. Thanks for those suggestions. Some thoughts:
-
get_tau_ci()
essentially doesconfint()$random[2,]
. Don't quite see the added benefit, except that one doesn't have to dig into the object returned byconfint()
to figure out how to pull the stuff out of it. I would rather stick to known idioms and possibly consider anas.data.frame
method forconfint.rma
objects that works appropriately not just forrma.uni
objects but also when used for example onrma.mv
objects. -
clean_rma_predictions(predict(res))
made me aware thatas.data.frame(predict(res))
was also returning thecr.lb
andcr.ub
elements, which are now superseded bypi.lb
andpi.ub
. Thanks! I will update theas.data.frame()
method to not do that, so that takes care of that. - Most importantly,
create_newmods()
. I really can't change howpredict.rma()
works at this point as this would break tons of code. As a helper function, this is something to consider, although not at the cost of adding two dependencies.
Let me clarify a few things:
-
Yes, this is a just a helper for my students
-
Great!
-
If I were to make a pull request, I would of course write it without the dependencies. I am not exactly following your concern about breaking
predict.rma()
. Incorporatingcreate_newmods()
wouldn't require changing any of the core functioning ofpredict.rma()
. This could be added at the top of the function, where currently there is a check thatnewmods
is a matrix. Rather than erroring, the function could check ifnewmods
is a data frame and then construct the needed matrix. This shouldn't break any other code because it is all happening upstream of the actual function operations, which would still only ever see the expected matrix.
An even safer alternative might be to add a newdata
argument that is explicitly intended to be a data frame, from which newmods
and newscale
can be constructed. That would definitely not collide with any existing code.
Besides those, what do you think of adding the xlev
and contrasts
information to the rma
object?
- Ok, but then I would rather go with a
as.data.frame
method. - Done.
- Good point. Okay, I see this as a possibility now. I also just realized that
formula()
withtype="mods"
(the default) should always work, even if the formula is specified viayi
(i.e., it should then just get the part after~
). Then one wouldn't have to do theis.null(ma_obj$formula.yi)
check. I just fixed this. Addingxlevels
andcontrasts
should also be possible, although how this is done requires some thought. Subsetting/removal of missings can affect things. Also, what if the user doesn't usedata
or has specified the model matrix directly tomods
? This will all require some thought.
Also 1. is done now. Will eventually tackle 3.