see icon indicating copy to clipboard operation
see copied to clipboard

`check_model()` performance degradation with large datasets

Open ANAMASGARD opened this issue 2 months ago • 2 comments

Problem

check_model() becomes unusably slow (5+ minutes) when checking models fitted on datasets with >10K observations.

Reproducible Example

library(performance) library(lme4)

Large dataset data <- data.frame( subject = rep(1:500, each = 50), x = rnorm(25000), y = rnorm(25000) )

model <- lmer(y ~ x + (1|subject), data = data) check_model(model) # Hangs for minutes

text

Root Cause

The plot.check_model() function in R/plot.check_model.R plots ALL data points when show_dots = TRUE, causing rendering slowdown with large datasets.

Proposed Fix

Implement intelligent data sampling in R/plot.check_model.R:

Sample data when too large if (nrow(model_data) > 5000) { model_data <- model_data[sample(nrow(model_data), 5000), ] }

text

This maintains visual fidelity while improving performance.

Related Links : -

https://github.com/easystats/performance/issues/851

Environment

  • R 4.3.0
  • see 0.8.6
  • performance 0.12.4

May I submit a PR with this fix?

ANAMASGARD avatar Oct 24 '25 15:10 ANAMASGARD

Yes that would be great

bwiernik avatar Oct 24 '25 23:10 bwiernik

@bwiernik Sir I have raised a PR :- https://github.com/easystats/see/pull/421

Please review it and give your valuable feedback . If any changes you would like me to make in it ?
Thanks !

ANAMASGARD avatar Oct 25 '25 09:10 ANAMASGARD