insight
insight copied to clipboard
`get/find_transformation` with linear transformations
get/find_transformationshould not return identity if an unsupported transformation is present.- Should support linear transformations?
m <- lm(I(2 * mpg + 3) ~ hp, mtcars)
insight::find_transformation(m)
#> [1] "identity"
Created on 2022-06-27 by the reprex package (v2.0.1)
Is there a context where these concerns arise other than I()?
Generally, we should extract the contents of I() and evaluate that as a function with numerical derivatives
For (1), a user can use some other unsupported function e.g. foo(y) ~ 1, or datawizard::ranktransform(y) ~ ..
And (2) should also work for functions of linear transformation:
scale()anddatawizard::standardise/standardize()/datawizard::center/centre()(all 3 of which can be inverted withdatawizard::unstandardise/unstandardize()https://github.com/easystats/datawizard/pull/191- datawizard::normalize()
anddatawizard::change_scale/data_rescale()which can be inverted withdatawizard::unnormalize()` https://github.com/easystats/datawizard/pull/191
The problem is how to detect foo()? cbind(x - y) should return "identity", foo() should return "unknown". Are there any other exceptions?
The problem is how to detect foo()? cbind(x - y) should return "identity", foo() should return "unknown". Are there any other exceptions?
Perhaps we should just return "identity" if no function or manipulation is detected? All others can be NULL?
Here is some working code to make trans/inversetrans functions for linear transformation functions above:
Define functions
as_linear_transform <- function(x, ...) {
UseMethod("as_linear_transform")
}
as_linear_inverse <- function(x, ...) {
UseMethod("as_linear_inverse")
}
as_linear_transform.numeric <- function(x, ...) {
coefs <- .get_ab(x)
function(x) {
(x - coefs["a"]) / coefs["b"]
}
}
as_linear_inverse.numeric <- function(x, ...) {
coefs <- .get_ab(x)
function(x) {
x * coefs["b"] + coefs["a"]
}
}
.get_ab <- function(x) {
attr <- attributes(x)
attr_names <- names(attr)
if (all(c("center", "scale") %in% attr_names)) {
a <- attr[["center"]]
b <- attr[["scale"]]
} else if (all(c("scaled:center", "scaled:scale") %in% attr_names)) {
a <- attr[["scaled:center"]]
b <- attr[["scaled:scale"]]
} else if (all(c("min_value", "range_difference") %in% attr_names)) {
a <- attr[["min_value"]]
b <- attr[["range_difference"]]
if ("to_range" %in% attr_names) {
to_range <- attr[["to_range"]]
b <- (b / diff(to_range))
a <- a - b * to_range[1]
}
}
c(a = a, b = b)
}
library(datawizard)
x <- rnorm(4, 40, 13)
Build trans/inverse functions from linear transformation functions in datawizard
foo <- as_linear_transform(standardize(x))
foo(x)
#> [1] -1.39325099 0.09865699 0.32238805 0.97220595
standardize(x)
#> [1] -1.39325099 0.09865699 0.32238805 0.97220595
#> attr(,"center")
#> [1] 40.5878
#> attr(,"scale")
#> [1] 11.42871
#> attr(,"robust")
#> [1] FALSE
foo <- as_linear_transform(scale(x))
foo(x)
#> [1] -1.39325099 0.09865699 0.32238805 0.97220595
scale(x)
#> [,1]
#> [1,] -1.39325099
#> [2,] 0.09865699
#> [3,] 0.32238805
#> [4,] 0.97220595
#> attr(,"scaled:center")
#> [1] 40.5878
#> attr(,"scaled:scale")
#> [1] 11.42871
foo <- as_linear_transform(change_scale(x, to = c(3, 14.5), range = c(-30, 200)))
foo(x)
#> [1] 5.733237 6.585766 6.713614 7.084943
change_scale(x, to = c(3, 14.5), range = c(-30, 200))
#> [1] 5.733237 6.585766 6.713614 7.084943
#> attr(,"min_value")
#> [1] -30
#> attr(,"range_difference")
#> [1] 230
#> attr(,"to_range")
#> [1] 3.0 14.5
Build inverse trans/inverse functions from linear transformation functions in datawizard
goo <- as_linear_inverse(center(x))
x
#> [1] 24.66474 41.71532 44.27228 51.69886
goo(center(x))
#> [1] 24.66474 41.71532 44.27228 51.69886
#> attr(,"center")
#> [1] 40.5878
#> attr(,"scale")
#> [1] 1
#> attr(,"robust")
#> [1] FALSE
goo <- as_linear_inverse(normalize(x))
x
#> [1] 24.66474 41.71532 44.27228 51.69886
goo(normalize(x))
#> [1] 24.66474 41.71532 44.27228 51.69886
#> attr(,"include_bounds")
#> [1] TRUE
#> attr(,"min_value")
#> [1] 24.66474
#> attr(,"range_difference")
#> [1] 27.03411
goo <- as_linear_inverse(scale(x))
x
#> [1] 24.66474 41.71532 44.27228 51.69886
goo(scale(x))
#> [,1]
#> [1,] 24.66474
#> [2,] 41.71532
#> [3,] 44.27228
#> [4,] 51.69886
#> attr(,"scaled:center")
#> [1] 40.5878
#> attr(,"scaled:scale")
#> [1] 11.42871
Created on 2022-07-05 by the reprex package (v2.0.1)
Perhaps we should just return "identity" if no function or manipulation is detected? All others can be NULL?
But cbind() is a function and should not return "unknown".
I'm not following either of your last comments @mattansb
@bwiernik I gave examples of functions the preform simple linear transformations (scale, center, standardize, normalize and change_scale) that could potentially be used in a formula (e.g., scale(y) ~ x) and how to obtain the transformation functions and their inverse (which is what get_transformation() returns, potentially).
I thought when we talk about "transformation" in the meaning of this function, we're talking about a different scale, like normal -> log, or normal -> exp, not standardizing/centering. So you suggest including those as well?
Hmmm I think it might be useful; having scale(y) ~ x give a transformation of "identity" might be a little misleading, perhaps?
But if this would be too much work / break some stuff, we can save this issue for a later date (:
Perhaps we could add a custom output first and then refine?