metafor Accept data.frame in predict()

Accept data.frame in predict()

Open bwiernik opened this issue 2 years ago • 8 comments

Classification:

Enhancement Suggestion

Summary

I'm trying to fit an rma.mv() model with a fairly complex moderator structure (factor variable with several levels and splines for a continuous variable, interaction effects to model the different outcomes). Given this complexity, I'm having a really hard time trying to construct the model matrix for newmods by hand. It would be much easier and less error-prone if predict.rma() accepted a data frame for the newmods argument the same way most R modeling functions do.

Reproducible Example (if applicable)

library(metafor)
#> Loading required package: Matrix
#> 
#> Loading the 'metafor' package (version 3.1-3). For an
#> introduction to the package please type: help(metafor)
dat <- escalc(measure="RR", ai=tpos, bi=tneg, ci=cpos, di=cneg, data=dat.bcg)
mod <- rma(yi ~ ablat + year, vi, data=dat)
predict(mod, newmods = dat[1:4,])
#> Error in predict.rma(mod, newmods = dat[1:4, ]): Argument 'newmods' should be a vector or matrix, but is of class 'escalc'.Argument 'newmods' should be a vector or matrix, but is of class 'data.frame'.

cf. predict.lm():

predict(lm(mpg ~ disp, data = mtcars), newdata = mtcars[1:5,])
#>         Mazda RX4     Mazda RX4 Wag        Datsun 710    Hornet 4 Drive 
#>          23.00544          23.00544          25.14862          18.96635 
#> Hornet Sportabout 
#>          14.76241

Aug 28 '21 15:08 bwiernik

predict.rma() is such an unholy mess that I barely dare to touch it :exploding_head:

But this aside, there are actually two things here:

First, dat[1:4,] contains variables that are not actually part of the moderators, so those should then be (silently?) ignored. So, at the moment, while

predict(mod, newmods = as.matrix(dat[1:4,c(3,8)]))

works,

predict(mod, newmods = as.matrix(dat[1:4,c(1,3,8)]))

also throws an error. One can debate this implementation. It is less convenient, but it can also be argued to provide a certain 'safeguard' (one should only put moderators into newmods that are actually in the model). I will have to think about how things are implemented in predict.rma() to see whether this requirement can be dropped though.

The other thing is the class of what you put into newmods. That indeed must be a matrix (or vector) at the moment. I can't just do a if (is.data.frame(newmods)) newmods <- as.matrix(newmods) since as.matrix(dat[1:4,]) would just end up being an all character matrix.

As a general workflow that deals with these two issues automatically, you could do however:

mod <- rma(yi, vi, mods = ~ ablat + year, data=dat)
predict(mod, newmods = model.matrix(formula(mod), dat[1:4,])[,-1])

That is also how you can deal with the fact that predict.rma() does not create the model matrix for you -- the user has to do this directly (that's a design decision that I regret, but it's way too late to change this now). So, for example, one has to do:

mod <- rma(yi, vi, mods = ~ ablat + year + I(ablat^2), data=dat)
predict(mod, newmods = c(44,1948,44^2))

(which gets really tedious when you have interactions, factors, and other 'derived' terms in your model), but one can also just do:

predict(mod, newmods = model.matrix(formula(mod), dat[1,])[,-1])

(the [,-1] is to get rid of the (Intercept) term, since that gets added by predict.rma() and shouldn't be in newmods -- another regrettable design decision).

Note that for splines, you have to be careful. The knot positions that were used in the model should obviously be the same as those used in newmods, so you should check carefully that model.matrix() is doing that. See also this example to illustrate this point: https://www.metafor-project.org/doku.php/tips:non_linear_meta_regression#spline_model

Aug 30 '21 10:08 wviechtb

The way predict works for most R modeling functions is that it takes the data frame supplied and then evaluates the model formula or terms with that data frame as the environment. So, for example, I would absolutely expect to be able to provide a data frame with unused variables or a factor variable that is not already dummy coded, etc.

Aug 30 '21 11:08 bwiernik

Yeah, that's just not how things work in metafor. When I wrote rma() (this goes back to code I wrote around 2000), I did not have such a deep understanding of things like model frames, terms, and so on, and hence did not make use of this 'machinery'. As I said, it's a regrettable design choice I made a long time ago and I cannot really fix this now or break tons of code. Using model.matrix(formula(<model>), <some data frame>)[,-1] is a workaround.

Aug 30 '21 11:08 wviechtb

Hey @wviechtb I've pulled together some functions to help construct a model matrix from a data frame in newmods https://gist.github.com/bwiernik/9602623e092662bcc70e9b37f78c838b

Now that the modeling objects return the data argument (https://github.com/bwiernik/metafor/commit/1c571656e7e482d9db8026523c2b9faf5d423019), I could simplify these a bit, and they could work natively inside predct.rma(). Would you be open to a PR for that?

Related to this, it would be really helpful if the rma object could also store xlev and contrasts slots for factor variables in the model.

Feb 13 '22 17:02 bwiernik

Hi @bwiernik. Thanks for those suggestions. Some thoughts:

get_tau_ci() essentially does confint()$random[2,]. Don't quite see the added benefit, except that one doesn't have to dig into the object returned by confint() to figure out how to pull the stuff out of it. I would rather stick to known idioms and possibly consider an as.data.frame method for confint.rma objects that works appropriately not just for rma.uni objects but also when used for example on rma.mv objects.
clean_rma_predictions(predict(res)) made me aware that as.data.frame(predict(res)) was also returning the cr.lb and cr.ub elements, which are now superseded by pi.lb and pi.ub. Thanks! I will update the as.data.frame() method to not do that, so that takes care of that.
Most importantly, create_newmods(). I really can't change how predict.rma() works at this point as this would break tons of code. As a helper function, this is something to consider, although not at the cost of adding two dependencies.

Feb 23 '22 14:02 wviechtb

Let me clarify a few things:

Yes, this is a just a helper for my students
Great!
If I were to make a pull request, I would of course write it without the dependencies. I am not exactly following your concern about breaking predict.rma(). Incorporating create_newmods() wouldn't require changing any of the core functioning of predict.rma(). This could be added at the top of the function, where currently there is a check that newmods is a matrix. Rather than erroring, the function could check if newmods is a data frame and then construct the needed matrix. This shouldn't break any other code because it is all happening upstream of the actual function operations, which would still only ever see the expected matrix.

An even safer alternative might be to add a newdata argument that is explicitly intended to be a data frame, from which newmods and newscale can be constructed. That would definitely not collide with any existing code.

Besides those, what do you think of adding the xlev and contrasts information to the rma object?

Feb 23 '22 15:02 bwiernik

Ok, but then I would rather go with a as.data.frame method.
Done.
Good point. Okay, I see this as a possibility now. I also just realized that formula() with type="mods" (the default) should always work, even if the formula is specified via yi (i.e., it should then just get the part after ~). Then one wouldn't have to do the is.null(ma_obj$formula.yi) check. I just fixed this. Adding xlevels and contrasts should also be possible, although how this is done requires some thought. Subsetting/removal of missings can affect things. Also, what if the user doesn't use data or has specified the model matrix directly to mods? This will all require some thought.

Feb 23 '22 18:02 wviechtb

Also 1. is done now. Will eventually tackle 3.

Mar 26 '22 20:03 wviechtb

metafor metafor copied to clipboard

Accept data.frame in predict()

Classification:

Summary

Reproducible Example (if applicable)

metafor
metafor copied to clipboard