broom icon indicating copy to clipboard operation
broom copied to clipboard

Improve performance of tidy.lm and tidy.glm for full-rank fits

Open capnrefsmmat opened this issue 1 year ago • 0 comments

tidy.lm and tidy.glm use summary(x)$coefficients, then join with coef(x) to handle the case where the fit is rank deficient and some terms with NA coefficients are missing from summary(x)$coefficients. However, in a simple experiment, the left_join() consumes a disproportionate amount of time. Only run the left_join() when the fit is rank-deficient and it is necessary, and add tests to ensure the rows are added correctly for rank-deficient fits.

An example:

library(broom)
library(profvis)

profvis(replicate(1000, tidy(lm(dist ~ speed, data = cars))))

Before this change, the expression ran in 3470 ms, of which 2570 ms (74%) was in tidy.lm() and 1590 ms (46%) was in left_join.data.frame(). After this change, the expression ran in 1740 ms, of which 1010 ms was in tidy.lm(), cutting 60% from the total time in tidy.lm(). This is useful for simulation studies that use tidy() to get estimates for each simulated fit.

Thanks for making a pull request to broom! A few things to keep in mind that will probably help this PR be merged more quickly:

  • [ ] Have you documented the change in NEWS.md?
    • As this doesn't change any user-facing API, I haven't, but can add it if performance changes are significant
  • [x] If this is your first time PRing to broom, have you added yourself as a contributor in the DESCRIPTION?
  • [x] Have you added any new vocabulary to inst/WORDLIST? (New vocabulary will be noted in the R-CMD-check from GitHub Actions.)
  • [ ] Does R-CMD-check pass on GitHub Actions? (Sometimes, checks may not be passing on the main branch already—if that's the case, just try to make sure your changes don't add any additional errors/warnings.)
  • [x] Have you updated DESCRIPTION if your feature/bug fix requires a specific version of a package?
  • [x] Have you added unit tests for any new functionality?

capnrefsmmat avatar Aug 06 '22 13:08 capnrefsmmat