performance Alternative R2?

Problem

R2 is affected by the absence of an intercept:

data <- mtcars |> 
  within({
    M = model.matrix(~1 + hp)
  })

m1 <- lm(mpg ~ 1 + hp, data = data)
m2 <- lm(mpg ~ 0 + M, data = data) # same model?

Same parameters…

parameters::compare_parameters(m1, m2)
#> Warning in mapply(function(.d, .l) {: longer argument not a multiple of length
#> of shorter
#> Warning: Some model terms could not be found in model data.
#>   You probably need to load the data into the environment.
#> Parameter    |                   m1 |                   m2
#> ----------------------------------------------------------
#> (Intercept)  | 30.10 (26.76, 33.44) |                     
#> hp           | -0.07 (-0.09, -0.05) |                     
#> M(Intercept) |                      | 30.10 (26.76, 33.44)
#> Mhp          |                      | -0.07 (-0.09, -0.05)
#> ----------------------------------------------------------
#> Observations |                   32 |                   32

Same dfs…

df.residual(m1)
#> [1] 30
df.residual(m2)
#> [1] 30

Same predictions…

all.equal(predict(m1), predict(m2))
#> [1] TRUE

But…

performance::r2(m1)
#> # R2 for Linear Regression
#>        R2: 0.602
#>   adj. R2: 0.589
performance::r2(m2)
#> # R2 for Linear Regression
#>        R2: 0.968
#>   adj. R2: 0.966

This isn’t an issue in performace, but R seems to give different values based on whether or not there is an intercept in the model

summary(m1)$r.squared
#> [1] 0.6024373
summary(m2)$r.squared
#> [1] 0.9681196

^{Created on 2022-10-14 by the reprex package (v2.0.1)}

Solution

Perhaps we can add an r2_prediction which computes the R2 values based on the prediction?

Either as: $R^2 = r_{\hat{y},y}^2$

Or (my preferred method) as $R^2 = 1 - \frac{Var(y - \hat{y})}{Var(y)}$

This will allow for correct R2 for transformed outcomes (using smearing), for non-linear models, etc...

Oct 14 '22 17:10 mattansb

Can you check what r2.default() returns?

Oct 14 '22 18:10 strengejacke

R is (correctly) adjusting its computation of the predicted sum of squares based on whether there's an intercept or not.

We shouldn't expect these two results to be the same because your model has an intercept in effect even though you told R it didn't.


r <- z$residuals
f <- z$fitted.values

mss <- if (attr(z$terms, "intercept"))
            sum((f - mean(f))^2) else sum(f^2)
        rss <- sum(r^2)

if (p != attr(z$terms, "intercept")) {
	df.int <- if (attr(z$terms, "intercept")) 1L else 0L
	ans$r.squared <- mss/(mss + rss)
	ans$adj.r.squared <- 1 - (1 - ans$r.squared) * ((n - df.int)/rdf)
	ans$fstatistic <- c(value = (mss/(p - df.int))/resvar,
			    numdf = p - df.int, dendf = rdf)
    } else ans$r.squared <- ans$adj.r.squared <- 0

Oct 14 '22 22:10 bwiernik

What is the justification for this formula @bwiernik ? I haven't come across it before...

Oct 16 '22 11:10 mattansb

See this great StackExchange answer https://stats.stackexchange.com/a/26205/364001

Oct 16 '22 11:10 bwiernik

performance performance copied to clipboard

Alternative R2?

Problem

Solution

performance
performance copied to clipboard