check_model: Enhance point identification
I would like to use check_model() as a a substitute for stats::plot.lm() because it gives generally prettier and more informative plots!
However it seems to fail my requirement for sensible point labelling of noteworthy points in all the panels and for control of the graphic features (e.g., point size/color) related to this. Or, perhaps I missed something in the documentation?
Here is a minimal example. What is important here is that there is one case (number 12) which is highly influential and should be made to stand out in all the plots.
library(tidyverse)
library(performance)
data(Davis, package="carData")
# remove missings
Davis <- Davis |>
drop_na()
davis.mod <- lm(repwt ~ weight * sex, data=Davis)
check_model(davis.mod,
check=c("linearity", "qq",
"homogeneity", "outliers"))
This gives:
Compare with the result of plot.lm(). Here, I used options id.n, cex.id and others to make the points I wanted to highlight stand out.
op <- par(mfrow = c(2,2), mar = c(5, 5, 3, 1) + .1)
plot(davis.mod,
cex.lab = 1.2, cex = 1.1,
id.n = 2, cex.id = 1.2, lwd = 2)
par(op)
This gives:
So, can I suggest an enhancement to the plots produced to make this possible?
@easystats/core-team do you have some ideas how to best implement this?
Looking at plot.lm(), it seems like 3 most extreme points are tagged based on either their abs(residual) or Cook's distance, and then the same (3) points are added as text to all the plots.
So that would be taken the same points we label in the Influential Obs plot and labeling them in all the plots that show points
In my heplots package, I've made a stab at doing this quite generally, but it's still incomplete.
noteworthy() defines a method to identify noteworthy obs. based on various criteria: extreme X or Y or residual, or Mahanalobis D^2, or even an externally computed vector.
Following discussion on ggplots-extenders, https://github.com/ggplot2-extenders/ggplot-extension-club/discussions/91
I have a ggplot stat_noteworthy(). It's not quite working for my test cases. You are welcome to work with this. And I'd be grateful if you got it working or improved it.
For the standard regression quartet of plots, it would make sense to use different criteria in the various plots.
I can take a stab at this
We probably should use our outlier functions?