`check_model()` performance degradation with large datasets
Problem
check_model() becomes unusably slow (5+ minutes) when checking models fitted on datasets with >10K observations.
Reproducible Example
library(performance) library(lme4)
Large dataset data <- data.frame( subject = rep(1:500, each = 50), x = rnorm(25000), y = rnorm(25000) )
model <- lmer(y ~ x + (1|subject), data = data) check_model(model) # Hangs for minutes
text
Root Cause
The plot.check_model() function in R/plot.check_model.R plots ALL data points when show_dots = TRUE, causing rendering slowdown with large datasets.
Proposed Fix
Implement intelligent data sampling in R/plot.check_model.R:
Sample data when too large if (nrow(model_data) > 5000) { model_data <- model_data[sample(nrow(model_data), 5000), ] }
text
This maintains visual fidelity while improving performance.
Related Links : -
https://github.com/easystats/performance/issues/851
Environment
- R 4.3.0
- see 0.8.6
- performance 0.12.4
May I submit a PR with this fix?
Yes that would be great
@bwiernik Sir I have raised a PR :- https://github.com/easystats/see/pull/421
Please review it and give your valuable feedback .
If any changes you would like me to make in it ?
Thanks !