insight icon indicating copy to clipboard operation
insight copied to clipboard

`get_data()` does not return scaled variables in subset.

Open strengejacke opened this issue 3 years ago • 4 comments

Works with log() etc., but not with scale(). Haven't looked into it in details, but looks a bit more complicated.

library(insight)
m <- lm(log(Sepal.Length) ~ scale(Sepal.Width), data = iris, subset = Species == "versicolor")
out <- get_data(m)

head(out)
#>    Sepal.Length Sepal.Width    Species
#> 51          7.0  0.32731751 versicolor
#> 52          6.4  0.32731751 versicolor
#> 53          6.9  0.09788935 versicolor
#> 54          5.5 -1.73753594 versicolor
#> 55          6.5 -0.59039513 versicolor
#> 56          5.7 -0.59039513 versicolor

head(subset(iris, 
            subset = Species == "versicolor", 
            select = c("Sepal.Length", "Sepal.Width", "Species")))
#>    Sepal.Length Sepal.Width    Species
#> 51          7.0         3.2 versicolor
#> 52          6.4         3.2 versicolor
#> 53          6.9         3.1 versicolor
#> 54          5.5         2.3 versicolor
#> 55          6.5         2.8 versicolor
#> 56          5.7         2.8 versicolor




m <- lm(log(Sepal.Length) ~ log(Sepal.Width), data = iris, subset = Species == "versicolor")
out <- get_data(m)

head(out)
#>    Sepal.Length Sepal.Width    Species
#> 51          7.0         3.2 versicolor
#> 52          6.4         3.2 versicolor
#> 53          6.9         3.1 versicolor
#> 54          5.5         2.3 versicolor
#> 55          6.5         2.8 versicolor
#> 56          5.7         2.8 versicolor

head(subset(iris, 
            subset = Species == "versicolor", 
            select = c("Sepal.Length", "Sepal.Width", "Species")))
#>    Sepal.Length Sepal.Width    Species
#> 51          7.0         3.2 versicolor
#> 52          6.4         3.2 versicolor
#> 53          6.9         3.1 versicolor
#> 54          5.5         2.3 versicolor
#> 55          6.5         2.8 versicolor
#> 56          5.7         2.8 versicolor

Created on 2022-03-21 by the reprex package (v2.0.1)

strengejacke avatar Mar 21 '22 08:03 strengejacke

Is the issue because scale() returns a matrix instead of a vector?

bwiernik avatar Mar 21 '22 09:03 bwiernik

I think so. It's treated as "matrix column", and not processed by .backtransform(). However, since we have the center and scale as attributes, maybe we should just backtransform "scale" as well?

strengejacke avatar Mar 21 '22 09:03 strengejacke

Yes I think so. We should convert the matrix column back to a standard vector column too

bwiernik avatar Mar 21 '22 12:03 bwiernik

We should convert the matrix column back to a standard vector column too

That's already done.

strengejacke avatar Mar 21 '22 12:03 strengejacke

Using subset drops the scale- and center attributes, that's why it doesn't work:

m <- lm(log(Sepal.Length) ~ scale(Sepal.Width), data = iris, subset = Species == "versicolor")
attributes(model.frame(m)$`scale(Sepal.Width)`)
#> $dim
#> [1] 50  1

m <- lm(log(Sepal.Length) ~ scale(Sepal.Width), data = iris)
attributes(model.frame(m)$`scale(Sepal.Width)`)
#> $dim
#> [1] 150   1
#> 
#> $`scaled:center`
#> [1] 3.057333
#> 
#> $`scaled:scale`
#> [1] 0.4358663

That's indeed tricky... Maybe we should - for now - at least give a warning?

strengejacke avatar Nov 10 '22 08:11 strengejacke

Should we close this in favor of #691 ?

vincentarelbundock avatar Dec 07 '22 01:12 vincentarelbundock

Let me check the examples in this issue and if they work with #691. If so, we can close

strengejacke avatar Dec 07 '22 06:12 strengejacke