insight
insight copied to clipboard
`get_data()` does not return scaled variables in subset.
Works with log() etc., but not with scale(). Haven't looked into it in details, but looks a bit more complicated.
library(insight)
m <- lm(log(Sepal.Length) ~ scale(Sepal.Width), data = iris, subset = Species == "versicolor")
out <- get_data(m)
head(out)
#> Sepal.Length Sepal.Width Species
#> 51 7.0 0.32731751 versicolor
#> 52 6.4 0.32731751 versicolor
#> 53 6.9 0.09788935 versicolor
#> 54 5.5 -1.73753594 versicolor
#> 55 6.5 -0.59039513 versicolor
#> 56 5.7 -0.59039513 versicolor
head(subset(iris,
subset = Species == "versicolor",
select = c("Sepal.Length", "Sepal.Width", "Species")))
#> Sepal.Length Sepal.Width Species
#> 51 7.0 3.2 versicolor
#> 52 6.4 3.2 versicolor
#> 53 6.9 3.1 versicolor
#> 54 5.5 2.3 versicolor
#> 55 6.5 2.8 versicolor
#> 56 5.7 2.8 versicolor
m <- lm(log(Sepal.Length) ~ log(Sepal.Width), data = iris, subset = Species == "versicolor")
out <- get_data(m)
head(out)
#> Sepal.Length Sepal.Width Species
#> 51 7.0 3.2 versicolor
#> 52 6.4 3.2 versicolor
#> 53 6.9 3.1 versicolor
#> 54 5.5 2.3 versicolor
#> 55 6.5 2.8 versicolor
#> 56 5.7 2.8 versicolor
head(subset(iris,
subset = Species == "versicolor",
select = c("Sepal.Length", "Sepal.Width", "Species")))
#> Sepal.Length Sepal.Width Species
#> 51 7.0 3.2 versicolor
#> 52 6.4 3.2 versicolor
#> 53 6.9 3.1 versicolor
#> 54 5.5 2.3 versicolor
#> 55 6.5 2.8 versicolor
#> 56 5.7 2.8 versicolor
Created on 2022-03-21 by the reprex package (v2.0.1)
Is the issue because scale() returns a matrix instead of a vector?
I think so. It's treated as "matrix column", and not processed by .backtransform(). However, since we have the center and scale as attributes, maybe we should just backtransform "scale" as well?
Yes I think so. We should convert the matrix column back to a standard vector column too
We should convert the matrix column back to a standard vector column too
That's already done.
Using subset drops the scale- and center attributes, that's why it doesn't work:
m <- lm(log(Sepal.Length) ~ scale(Sepal.Width), data = iris, subset = Species == "versicolor")
attributes(model.frame(m)$`scale(Sepal.Width)`)
#> $dim
#> [1] 50 1
m <- lm(log(Sepal.Length) ~ scale(Sepal.Width), data = iris)
attributes(model.frame(m)$`scale(Sepal.Width)`)
#> $dim
#> [1] 150 1
#>
#> $`scaled:center`
#> [1] 3.057333
#>
#> $`scaled:scale`
#> [1] 0.4358663
That's indeed tricky... Maybe we should - for now - at least give a warning?
Should we close this in favor of #691 ?
Let me check the examples in this issue and if they work with #691. If so, we can close