panelr predict() at specific values (newdata)

I can't get predict() working with newdata from your example. What am I doing wrong?

library(panelr)
data("WageData")
wages <- panel_data(WageData, id = id, wave = t)
model <- wbm(lwage ~ lag(union) + wks, data = wages)

predict(model, newdata = data.frame(union = 1:4, wks = 40, lwage = 50))
#> Error in eval(predvars, data, env): object 'id' not found

predict(model, newdata = data.frame(
    union = 1:4,
    wks = 40,
    lwage = 50,
    id = 1,
    t = 5
  )
)
#> Error in eval(predvars, data, env): object 'id' not found

However, taking my example from the last issue, it works...

library(panelr)
#> Loading required package: lme4
#> Loading required package: Matrix
load(url("https://github.com/strengejacke/mixed-models-snippets/raw/master/example.RData"))
pd <- panel_data(d, id = ID, wave = time)
m <- wbm(QoL ~ x_tv | age + z1_ti + z2_ti + time  | (time | ID), data = pd)
predict(m, newdata = data.frame(
  QoL = 70,
  x_tv = seq(-5, 5, 1), 
  age = mean(pd$age),
  z1_ti = 1,
  z2_ti = 2000,
  time = 2,
  ID = 5
))
#>        1        2        3        4        5        6        7        8 
#> 75.25958 71.52489 67.79020 64.05551 60.32081 56.58612 52.85143 49.11674 
#>        9       10       11 
#> 45.38204 41.64735 37.91266

^{Created on 2019-05-23 by the reprex package (v0.3.0)}

May 23 '19 10:05 strengejacke

Thanks for reporting this. The short version is that it was expecting a panel_data object, but I don't think it should (implicitly) require it. I've made some fixes that should fix this.

As another note, I'm open to feedback on the predict() interface. Since new variables are created by wbm() and friends, it isn't obvious whether users will expect by default that newdata resembles the data argument to wbm() or the internally processed data passed to (g)lmer(). As it is, I give the user the option but I don't know what the best default behavior is.

May 23 '19 16:05 jacob-long

The short version is that it was expecting a panel_data object, but I don't think it should (implicitly) require it.

I think for other packages, like ggeffects, it's easier to handle with simple data frames, unless you return the complete panel-data-object via standard methods like model.frame().

Since new variables are created by wbm() and friends,

This was somewhat confusing to me, because I was not sure which variables are required in newdata when I read the error msg. As a normal user, I would expect not to provide more variables than I used to fit the model (in the first instance, i.e. internal data transformation is hidden from the user, so I don't know what new variables are created). Furthermore, it seems like the response variable (DV) is required as well, which is in most other predict() methods not the case. It would be great if you either provide some details in the docs (on what is required), or make the behaviour and usage of predict() consistent with other predict()-methods. In short, I would vote for newdata to resemble a data frame that contains those variables I used to fit the model, i.e. that I used in the formula (possibly excluding the DV?).

From what I have seen in my example that works, my impression is that it's very easy to compute marginal effects from panelr models (at least models of class wblm), which is really a great advantage. In general, the usage is quite straightforward!

May 23 '19 19:05 strengejacke

Any news on this? Can you appraise if / when you will find time to address this issue, or if you think there's anything to address at all from your side?

Jun 10 '19 20:06 strengejacke

At this point, the version I'm going to submit to CRAN in the coming days addresses the problem of requiring the DV to be part of newdata.

As you noted, I figured the most intuitive newdata should be like the input data — i.e., the user is not expected to create the extra variables created by wbm(). They can if they want, by using the raw argument, which I consider for advanced users.

This example fails

library(panelr)
data("WageData")
wages <- panel_data(WageData, id = id, wave = t)
model <- wbm(lwage ~ lag(union) + wks, data = wages)

predict(model, newdata = data.frame(union = 1:4, wks = 40, lwage = 50))
#> Error in eval(predvars, data, env): object 'id' not found

because I'm following the lme4 convention of requiring the grouping variables to be present in newdata unless re.form == ~0. We could quibble with whether lme4 has the right default here, of course. I suspect you've encountered this issue when designing your ggeffects package.

Jul 11 '19 18:07 jacob-long