insight icon indicating copy to clipboard operation
insight copied to clipboard

get_predicted_ci for mlm models

Open mattansb opened this issue 4 years ago • 13 comments
trafficstars

m <- lm(cbind(mpg, hp, drat) ~ factor(cyl) + wt, data = mtcars)

insight::get_predicted_ci(m, predict(m))
#> Error in mm %*% vcovmat: non-conformable arguments

(Once this is fixed, this can also be applied to afex_aov models. #400)

mattansb avatar Jul 18 '21 05:07 mattansb

m <- lm(cbind(mpg, hp, drat) ~ factor(cyl) + wt, data = mtcars)
predict(m)
#>                          mpg        hp     drat
#> Mazda RX4           21.33650 121.76515 3.701569
#> Mazda RX4 Wag       20.51907 122.03216 3.642144
#> Datsun 710          26.55377  82.67225 4.062922
#> Hornet 4 Drive      19.42916 122.38818 3.562909
#> Hornet Sportabout   16.89262 208.62872 3.359606
#> Valiant             18.64379 122.64473 3.505814
#> Duster 360          16.47590 208.76485 3.329311
#> Merc 240D           23.76489  83.58325 3.860176
#> Merc 230            23.89311  83.54136 3.869497
#> Merc 280            18.70790 122.62378 3.510475
#> Merc 280C           18.70790 122.62378 3.510475
#> Merc 450SE          14.87309 209.28841 3.212790
#> Merc 450SL          15.96300 208.93239 3.292024
#> Merc 450SLC         15.80272 208.98474 3.280372
#> Cadillac Fleetwood  11.09046 210.52401 2.937800
#> Lincoln Continental 10.53269 210.70621 2.897251
#> Chrysler Imperial   10.78593 210.62348 2.915661
#> Fiat 128            26.93844  82.54660 4.090887
#> Honda Civic         28.81373  81.93403 4.227217
#> Toyota Corolla      28.10849  82.16440 4.175947
#> Toyota Corona       26.08896  82.82408 4.029131
#> Dodge Challenger    16.63618 208.71249 3.340963
#> AMC Javelin         16.90865 208.62349 3.360771
#> Camaro Z28          15.61038 209.04757 3.266389
#> Pontiac Firebird    15.59435 209.05280 3.265224
#> Fiat X1-9           27.78793  82.26911 4.152643
#> Porsche 914-2       27.13078  82.48377 4.104870
#> Lotus Europa        29.14070  81.82723 4.250987
#> Ford Pantera L      17.75814 208.34600 3.422527
#> Ferrari Dino        20.85566 121.92221 3.666613
#> Maserati Bora       16.47590 208.76485 3.329311
#> Volvo 142E          25.07919  83.15393 3.955723

Created on 2021-07-18 by the reprex package (v2.0.0)

what should we do in this case? return one prediction vector for each response?

DominiqueMakowski avatar Jul 18 '21 06:07 DominiqueMakowski

Now that get_predicted_ci is a method (get_predicted_ci.default), we could perhaps sapply it on mlm objects? I have no idea how mlm objects work

DominiqueMakowski avatar Jul 18 '21 07:07 DominiqueMakowski

I have no answers for you, just more questions...

mattansb avatar Jul 18 '21 07:07 mattansb

Where do we use get_predicted_ci() internally?

bwiernik avatar Jul 18 '21 12:07 bwiernik

https://github.com/easystats/insight/blob/34af319fa233d6611fa7e40013f899343a558293/R/get_predicted.R#L189-L194

https://github.com/easystats/insight/blob/34af319fa233d6611fa7e40013f899343a558293/R/get_predicted.R#L262

As a matter of fact I consider get_predicted_ci almost as an semi-internal function, since I don't expect much users to use it directly (unless they want to compute different CIs for the same predictions without re-running predictions or some specific other specific usecases)

DominiqueMakowski avatar Jul 18 '21 13:07 DominiqueMakowski

Yeah, that was my understanding of it. I like that it's exported for that case (say to visualize 50, 80, 95, 99 intervals).

I see 3 options for a structure here:

  • array--nope
  • list of data frames, one for each response, each with same structure as a univariate response
  • stacked data frame with an added response column

Which of the last two do you think would be most intuitive and useful for both us internally and users potentially using?

bwiernik avatar Jul 18 '21 15:07 bwiernik

3 can easily be turned into 2 w/ split, but you'd also need to have an index column for getting each multivariate prediction... Errrrr

mattansb avatar Jul 18 '21 18:07 mattansb

What do you mean by index there (versus response)?

bwiernik avatar Jul 18 '21 21:07 bwiernik

Oh you mean if keep_iter is TRUE? Yeah. That wouldn't be too bad?

bwiernik avatar Jul 18 '21 21:07 bwiernik

I don't know what keep_iter is... But I mean that each prediction is a multi-dimensional point:

m <- lm(cbind(mpg, hp) ~ cyl, mtcars)
predict(m) |> head()
#>                        mpg        hp
#> Mazda RX4         20.62984 140.69532
#> Mazda RX4 Wag     20.62984 140.69532
#> Datsun 710        26.38142  76.77876
#> Hornet 4 Drive    20.62984 140.69532
#> Hornet Sportabout 14.87826 204.61188
#> Valiant           20.62984 140.69532

Created on 2021-07-19 by the reprex package (v2.0.0)

So if we re-shape this, we need to be able to tell which values from y1 go with which values from y2 ... yk.

mattansb avatar Jul 19 '21 04:07 mattansb

I'm leaning toward a list, then we don't need to add any sort of index column, and each element can just be a typical univariate predict_ci object with appropriate classes.

bwiernik avatar Jul 19 '21 07:07 bwiernik

The question is, how will other functions that work with the results deal with a list?

mattansb avatar Jul 19 '21 07:07 mattansb

Give it its own class, whose method is "lapply the generic over the list". We would need a new class for either approach

bwiernik avatar Jul 19 '21 07:07 bwiernik