When the response of a model is matrix-like, get_response() returns this as a dataframe with correct names. However, get_data() is returning the response as a nested matrix-column in the data.frame:

#> Loading required package: Matrix
(gm1 <- glmer(cbind(incidence, size - incidence) ~ period + (1 | herd),
              data = cbpp, family = binomial))
#> Generalized linear mixed model fit by maximum likelihood (Laplace
#>   Approximation) [glmerMod]
#>  Family: binomial  ( logit )
#> Formula: cbind(incidence, size - incidence) ~ period + (1 | herd)
#>    Data: cbpp
#>      AIC      BIC   logLik deviance df.resid 
#> 194.0531 204.1799 -92.0266 184.0531       51 
#> Random effects:
#>  Groups Name        Std.Dev.
#>  herd   (Intercept) 0.6421  
#> Number of obs: 56, groups:  herd, 15
#> Fixed Effects:
#> (Intercept)      period2      period3      period4  
#>     -1.3983      -0.9919      -1.1282      -1.5797
insight::get_response(gm1) |> head()
#>   incidence size
#> 1         2   14
#> 2         3   12
#> 3         4    9
#> 4         0    5
#> 5         3   22
#> 6         1   18
insight::get_data(gm1) |> head()
#>   cbind(incidence, size - incidence).incidence
#> 1                                            2
#> 2                                            3
#> 3                                            4
#> 4                                            0
#> 5                                            3
#> 6                                            1
#>   cbind(incidence, size - incidence).V2 period herd incidence size
#> 1                                    12      1    1         2   14
#> 2                                     9      2    1         3   12
#> 3                                     5      3    1         4    9
#> 4                                     5      4    1         0    5
#> 5                                    19      1    2         3   22
#> 6                                    17      2    2         1   18

This is producing errors when the data are used by other functions:

#> Loading required package: Matrix
(gm1 <- glmer(cbind(incidence, size - incidence) ~ period + (1 | herd),
              data = cbpp, family = binomial))
#> Generalized linear mixed model fit by maximum likelihood (Laplace
#>   Approximation) [glmerMod]
#>  Family: binomial  ( logit )
#> Formula: cbind(incidence, size - incidence) ~ period + (1 | herd)
#>    Data: cbpp
#>      AIC      BIC   logLik deviance df.resid 
#> 194.0531 204.1799 -92.0266 184.0531       51 
#> Random effects:
#>  Groups Name        Std.Dev.
#>  herd   (Intercept) 0.6421  
#> Number of obs: 56, groups:  herd, 15
#> Fixed Effects:
#> (Intercept)      period2      period3      period4  
#>     -1.3983      -0.9919      -1.1282      -1.5797
modelbased::estimate_expectation(gm1, include_random = TRUE)
#> Error in cbind(incidence, size - incidence): object 'incidence' not found

@DominiqueMakowski See the problem this is causing with estimate_prediction()

Indeed, we should probably add a step to get_data to sanitize the output right?

Yeah that would be good

Both columns incidence and size are present in the returned data frame, so I'm not sure if this is an issue of get_data()?

Any comments on my comment? :-)

At a minimum, the response matrix-column probably shouldn't be there.

No get_data() issue:

#> Loading required package: Matrix
(gm1 <- glmer(cbind(incidence, size - incidence) ~ period + (1 | herd),
              data = cbpp, family = binomial))
#> Generalized linear mixed model fit by maximum likelihood (Laplace
#>   Approximation) [glmerMod]
#>  Family: binomial  ( logit )
#> Formula: cbind(incidence, size - incidence) ~ period + (1 | herd)
#>    Data: cbpp
#>      AIC      BIC   logLik deviance df.resid 
#> 194.0531 204.1799 -92.0266 184.0531       51 
#> Random effects:
#>  Groups Name        Std.Dev.
#>  herd   (Intercept) 0.6421  
#> Number of obs: 56, groups:  herd, 15
#> Fixed Effects:
#> (Intercept)      period2      period3      period4  
#>     -1.3983      -0.9919      -1.1282      -1.5797

get_data(gm1) |> str()
#> 'data.frame':    56 obs. of  4 variables:
#>  $ period   : Factor w/ 4 levels "1","2","3","4": 1 2 3 4 1 2 3 1 2 3 ...
#>  $ herd     : Factor w/ 15 levels "1","2","3","4",..: 1 1 1 1 2 2 2 3 3 3 ...
#>  $ incidence: num  2 3 4 0 3 1 1 8 2 0 ...
#>  $ size     : num  14 12 9 5 22 18 21 22 16 16 ...
get_data(gm1) |> head()
#>   period herd incidence size
#> 1      1    1         2   14
#> 2      2    1         3   12
#> 3      3    1         4    9
#> 4      4    1         0    5
#> 5      1    2         3   22
#> 6      2    2         1   18

modelbased::estimate_expectation(gm1, include_random = TRUE)
#> Error in cbind(incidence, size - incidence): object 'incidence' not found

