performance icon indicating copy to clipboard operation
performance copied to clipboard

check_model: Enhance point identification

Open friendly opened this issue 4 months ago • 6 comments

I would like to use check_model() as a a substitute for stats::plot.lm() because it gives generally prettier and more informative plots!

However it seems to fail my requirement for sensible point labelling of noteworthy points in all the panels and for control of the graphic features (e.g., point size/color) related to this. Or, perhaps I missed something in the documentation?

Here is a minimal example. What is important here is that there is one case (number 12) which is highly influential and should be made to stand out in all the plots.

library(tidyverse)
library(performance)

data(Davis, package="carData")
# remove missings
Davis <- Davis |>
  drop_na() 
davis.mod <- lm(repwt ~ weight * sex, data=Davis)  

check_model(davis.mod, 
            check=c("linearity", "qq", 
                    "homogeneity", "outliers"))

This gives:

Image

Compare with the result of plot.lm(). Here, I used options id.n, cex.id and others to make the points I wanted to highlight stand out.

op <- par(mfrow = c(2,2), mar = c(5, 5, 3, 1) + .1)
plot(davis.mod, 
     cex.lab = 1.2, cex = 1.1, 
     id.n = 2, cex.id = 1.2, lwd = 2)
par(op)

This gives:

Image

So, can I suggest an enhancement to the plots produced to make this possible?

friendly avatar Aug 21 '25 19:08 friendly

@easystats/core-team do you have some ideas how to best implement this?

strengejacke avatar Aug 31 '25 10:08 strengejacke

Looking at plot.lm(), it seems like 3 most extreme points are tagged based on either their abs(residual) or Cook's distance, and then the same (3) points are added as text to all the plots.

mattansb avatar Aug 31 '25 10:08 mattansb

So that would be taken the same points we label in the Influential Obs plot and labeling them in all the plots that show points

bwiernik avatar Aug 31 '25 13:08 bwiernik

In my heplots package, I've made a stab at doing this quite generally, but it's still incomplete.

noteworthy() defines a method to identify noteworthy obs. based on various criteria: extreme X or Y or residual, or Mahanalobis D^2, or even an externally computed vector.

Following discussion on ggplots-extenders, https://github.com/ggplot2-extenders/ggplot-extension-club/discussions/91 I have a ggplot stat_noteworthy(). It's not quite working for my test cases. You are welcome to work with this. And I'd be grateful if you got it working or improved it.

For the standard regression quartet of plots, it would make sense to use different criteria in the various plots.

friendly avatar Aug 31 '25 15:08 friendly

I can take a stab at this

bwiernik avatar Aug 31 '25 15:08 bwiernik

We probably should use our outlier functions?

mattansb avatar Aug 31 '25 15:08 mattansb