performance icon indicating copy to clipboard operation
performance copied to clipboard

How to turn off CIs for check_model?

Open jbohenek opened this issue 2 years ago • 4 comments

I am using check_model() for a course, and it's a wonderful teaching tool. However, sometimes the confidence interval bands generated from the LOESS fit from geom_smooth() are ridiculously large, thereby expanding axes and making any potential pattern in the residuals unnoticeable. Is there a quick and easy way to turn off CIs so the y-axis behaves? I wasn't able to a discover a quick fix to this besides extracting components from check_model() and replotting with geom_smooth(se=F) or forcing it linear with geom_smooth(method="lm").

reprex:

library(tidyverse)
#> Warning: package 'tidyverse' was built under R version 4.2.3
#> Warning: package 'ggplot2' was built under R version 4.2.3
#> Warning: package 'tibble' was built under R version 4.2.3
#> Warning: package 'readr' was built under R version 4.2.3
#> Warning: package 'purrr' was built under R version 4.2.3
#> Warning: package 'dplyr' was built under R version 4.2.3
#> Warning: package 'lubridate' was built under R version 4.2.3
library(easystats)
#> Warning: package 'easystats' was built under R version 4.2.3
#> # Attaching packages: easystats 0.6.0 (red = needs update)
#> ✔ bayestestR  0.13.1   ✔ correlation 0.8.4 
#> ✔ datawizard  0.9.0    ✔ effectsize  0.8.6 
#> ✖ insight     0.19.5   ✔ modelbased  0.8.6 
#> ✔ performance 0.10.5   ✔ parameters  0.21.2
#> ✔ report      0.5.7    ✔ see         0.8.0 
#> 
#> Restart the R-Session and update packages in red with `easystats::easystats_update()`.
df<-read_csv("https://raw.githubusercontent.com/jbohenek/biol_5130/main/viagra_data.csv") |> mutate(dose=factor(dose))
#> Rows: 30 Columns: 3
#> ── Column specification ────────────────────────────────────────────────────────
#> Delimiter: ","
#> dbl (3): dose, libido, partnerLibido
#> 
#> ℹ Use `spec()` to retrieve the full column specification for this data.
#> ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

lm(libido ~ dose, data=df) |> 
  check_model()

Created on 2023-10-25 with reprex v2.0.2

jbohenek avatar Oct 25 '23 15:10 jbohenek

@IndrajeetPatil We could add a ci argument or similar to performance::check_model(), save as attribute, and then in see we would set SE = FALSE in geom_smooth().

strengejacke avatar Feb 16 '24 19:02 strengejacke

That would be good.

I wonder if we should also try to detect if number of discrete fitted values is small/all the predictors are categorical and then omit the linearity plot and change the homogeneity plot to be something better for categorical regression ?

bwiernik avatar Feb 17 '24 12:02 bwiernik

Upon further testing, it's not always just the SE that causes issues with visualization. Sometimes the LOESS curve bends in odd ways between discrete values that produces the same effect as seen above. So in addition to SE=F, maybe also method="lm"? I know a linear fit isn't ideal when evaluating these things, but it's better than nothing when the y-axis goes haywire. For example, see the check_model() outlier plot of a 2x2 factorial anova below (this can of course happen with any of the plots with a LOESS curve).


library(tidyverse)
library(easystats)
#> Warning: package 'easystats' was built under R version 4.2.3
#> # Attaching packages: easystats 0.7.0
#> ✔ bayestestR  0.13.2   ✔ correlation 0.8.4 
#> ✔ datawizard  0.9.1    ✔ effectsize  0.8.6 
#> ✔ insight     0.19.8   ✔ modelbased  0.8.7 
#> ✔ performance 0.10.9   ✔ parameters  0.21.5
#> ✔ report      0.5.8    ✔ see         0.8.2
df<-read_csv("https://raw.githubusercontent.com/jbohenek/biol_5130/main/opsin.csv")
#> Rows: 33 Columns: 3
#> ── Column specification ────────────────────────────────────────────────────────
#> Delimiter: ","
#> chr (2): population, water
#> dbl (1): sws1
#> 
#> ℹ Use `spec()` to retrieve the full column specification for this data.
#> ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
fit<-lm(sws1 ~ water*population, data=df)
check_model(fit)

image Created on 2024-02-17 with reprex v2.1.0

jbohenek avatar Feb 17 '24 23:02 jbohenek

We should probably just have alternative visualizations for categorical models. The current plots really only work for models with continuous predictors so that there are numerous fitted values on the x axis

bwiernik avatar Feb 18 '24 02:02 bwiernik