estimatr icon indicating copy to clipboard operation
estimatr copied to clipboard

Catalog and discuss S3 methods that lm has

Open lukesonnet opened this issue 7 years ago • 17 comments

Have added

  • [x] confint.lm
  • [x] nobs.lm
  • [x] predict.lm - fix this to use the right variance
  • [x] print.lm
  • [x] summary.lm
  • [x] tidy.lm
  • [x] vcov.lm

Will add

  • [ ] anova.lm - may require the residuals or possibly more computation on the residuals
  • [ ] formula.lm
  • [ ] labels.lm - gets all RHS variable names from formula
  • [ ] variable.names.lm (free call to coefficient_name)
  • [ ] weights.lm - we actually already return weights when they are supplied as predict.lm needs them

Would be happy to add

  • [ ] simulate.lm - Simulates the outcome according to the fits and estimated noise in the model
  • The following are all about influence of individual data points, would necessitate keeping the n size objects (and maybe adding some computation in some cases where the hat matrix isn't free)
    • [ ] cooks.distance.lm
    • [ ] plot.lm
    • [ ] hatvalues.lm
    • [ ] influence.lm
    • [ ] dfbeta.lm
    • [ ] dfbetas.lm
    • [ ] rstandard.lm
    • [ ] rstudent.lm
  • Things that may require the residuals or possibly more computation on the residuals
    • [ ] residuals.lm
    • [ ] logLik.lm
    • [ ] deviance.lm
    • [ ] extractAIC.lm, regular AIC is what fultz prefers
  • [ ] model.frame.lm - requires keeping this obj
  • [ ] model.matrix.lm - requires keeping this obj

Don't really care to add

  • [ ] case.names.lm - Requires keeping obj of length n, can do if we keep any obj of that size
  • Things that require the QR object or computation on the QR object
    • [ ] qr.lm
    • [ ] alias.lm - returns a matrix of linearly dependent terms
    • [ ] kappa.lm
    • [ ] proj.lm
  • [ ] effects.lm

Won't add

  • [ ] add1.lm - This and drop are kind of cool; they take your model and add/remove parameters and refit and then run anova to compare them. Unnecessary though.
  • [ ] drop1.lm
  • [ ] coerce,oldClass,S3-method - people can just clobber our S3 class if they want
  • [ ] family.lm - we only have gaussian w/ identity link
  • [ ] dummy.coef.lm - used to get dummies created by model.matrix as one coefficient, but ends up just taking 2nd factor as a dummy and dropping the rest?
  • [ ] extract,lm-method - I think this is just used basically as a standin for $?
  • [ ] show,oldClass-method - To allow us to use something besides print to display
  • [ ] slotsFromS3,oldClass-method

lukesonnet avatar Feb 06 '18 19:02 lukesonnet

I had some students run into an issue where they wanted to use the fitted() and residuals() S3 methods and I was surprised to find out that we didn't support them. They wanted to do a fit versus residuals plot.

I assume this is because we don't retain the data required (same reason we don't implement the plot directly).

I helped them hack around it by using predict to extract the fit and residuals, but it seemed like a bit obscure.

Just wanted to surface this as user feedback, no particular necessity to respond to it.

aaronrudkin avatar Jun 04 '18 19:06 aaronrudkin

We just added returns for these but not the S3 methods. Have them get the new version and retrieve using $ or [[

lukesonnet avatar Jun 04 '18 19:06 lukesonnet

I'll add the S3s

lukesonnet avatar Jun 04 '18 19:06 lukesonnet

It would be nice to have at least residuals() and logLik() implemented (I think logLik needs residuals), to compute indices like AIC etc.

strengejacke avatar Apr 25 '19 09:04 strengejacke

Residuals can be computed quite easily, or is there some more complex stuff behind the scenes?

m1 <- iv_robust(mpg ~ gear + cyl | carb + wt, data = mtcars) residuals <- model.frame(m1)[[m1$outcome]] - m1$fitted.values

strengejacke avatar Apr 25 '19 09:04 strengejacke

Thanks @strengejacke ! Correct me if I'm wrong, but the constraint here is that we don't return the data in the iv_robust object, right @lukesonnet ?

acoppock avatar May 13 '19 20:05 acoppock

It would be nice to have at least residuals() and logLik() implemented (I think logLik needs residuals), to compute indices like AIC etc.

the log-liklihood only needs the RSS iirc, which is provided (or else RSS/df is reported as a variance, I forget which).

nfultz avatar May 13 '19 20:05 nfultz

Thanks @strengejacke ! Correct me if I'm wrong, but the constraint here is that we don't return the data in the iv_robust object, right @lukesonnet ?

Not sure, but using the insight package, I can access the data:

library("insight")
library("estimatr")

m1 <- iv_robust(mpg ~ gear + cyl | carb + wt, data = mtcars)
get_data(m1)
#>                      mpg carb + wt gear cyl carb    wt
#> Mazda RX4           21.0      TRUE    4   6    4 2.620
#> Mazda RX4 Wag       21.0      TRUE    4   6    4 2.875
#> Datsun 710          22.8      TRUE    4   4    1 2.320
#> Hornet 4 Drive      21.4      TRUE    3   6    1 3.215
#> Hornet Sportabout   18.7      TRUE    3   8    2 3.440
#> Valiant             18.1      TRUE    3   6    1 3.460
#> Duster 360          14.3      TRUE    3   8    4 3.570
#> Merc 240D           24.4      TRUE    4   4    2 3.190
#> Merc 230            22.8      TRUE    4   4    2 3.150
#> Merc 280            19.2      TRUE    4   6    4 3.440
#> Merc 280C           17.8      TRUE    4   6    4 3.440
#> Merc 450SE          16.4      TRUE    3   8    3 4.070
#> (... truncated)

Created on 2019-05-14 by the reprex package (v0.2.1)

strengejacke avatar May 14 '19 09:05 strengejacke

This is some code I'm currently using:

#' @importFrom insight get_response
#' @export
residuals.iv_robust <- function(object, ...) {
  insight::get_response(object) - object$fitted.values
}

And then logLik() should work, code adapted from R-stats:

#' @importFrom stats residuals
#' @export
logLik.iv_robust <- function(object, ...) {
  res <- stats::residuals(object)
  p <- object$rank
  w <- object$weights

  N <- length(res)

  if (is.null(w)) {
    w <- rep.int(1, N)
  } else {
    excl <- w == 0
    if (any(excl)) {
      res <- res[!excl]
      N <- length(res)
      w <- w[!excl]
    }
  }
  N0 <- N

  val <- 0.5 * (sum(log(w)) - N * (log(2 * pi) + 1 - log(N) + log(sum(w * res^2))))

  attr(val, "nall") <- N0
  attr(val, "nobs") <- N
  attr(val, "df") <- p + 1
  class(val) <- "logLik"

  val
}

strengejacke avatar May 14 '19 09:05 strengejacke

Thanks for the pointer to insight! might be just the thing.

acoppock avatar May 15 '19 15:05 acoppock

Necroing this thread, please see also http://discuss.declaredesign.org/t/anova-with-lm-robust-objects/127/2 where I hijacked residuals + deviance generics from lm for someone.

nfultz avatar Aug 19 '19 20:08 nfultz

Re-necomancing this thread to add my suggestions from above; it would be very good to have all of the "standard" lm accessor functions included for compatibility with other packages that use them. For example, car uses them extensively for hypothesis testing and would be easy to include the lm_robust objects if it had the accessors.

If I have time later this summer, I might try including it in the package, if there's no one currently working on that issue.

jlgraves-ubc avatar May 18 '21 22:05 jlgraves-ubc

Dropping in to say that it would be very convenient to implement the anova.lm_robust() method

alexpghayes avatar Jun 09 '22 19:06 alexpghayes

@alexpghayes I haven't tried, but if drop1 somehow works I think you can recursively call it + combine that with the workaround in the thread I linked before using anova.lmlist.

From a quick look, anova.lm won't work because it reads data from the internal data structure rather than using any accessors itself. I've had limited success getting R core to fix those types of issues. In an ideal world we could just inherit from the lm class and things magically work without reinventing the wheel :|.

nfultz avatar Jun 09 '22 21:06 nfultz

@alexpghayes I haven't tried, but if drop1 somehow works I think you can recursively call it + combine that with the workaround in the thread I linked before using anova.lmlist.

Thanks. Will this approach work with multiple outcomes?

From a quick look, anova.lm won't work because it reads data from the internal data structure rather than using any accessors itself. I've had limited success getting R core to fix those types of issues. In an ideal world we could just inherit from the lm class and things magically work without reinventing the wheel :|.

I agree that this is frustrating, and do not condone the current design of stats, but I do think it is worth briefly pointing out that it doesn't necessarily make statistical sense for anova() to just call generics, since sampling distributions are specific to individual estimators. Based on my experiences maintaining broom, I personally think it is good practice to silo estimator-specific code as much as possible in order to maximize statistical correctness.

alexpghayes avatar Jun 09 '22 22:06 alexpghayes

It might work, but IIRC nobdy has ever asked about mlm compatibility before or tried.

Also do you need anova / manova multivariate tests or aov per-outcome tests ?

BTW thought you were UCLA Alex Hayes but I guess not? Anyways, welcome.

nfultz avatar Jun 09 '22 22:06 nfultz

if anyone is in need to calculate AIC

myAIC <- function(estimatr_fit) {
  
  2*(estimatr_fit$k+1) + estimatr_fit$nobs * (log(2*pi* (1-estimatr_fit$r.squared)*estimatr_fit$tss/estimatr_fit$nobs ) + 1)
  
}

albertostefanelli avatar Apr 19 '23 16:04 albertostefanelli