estimatr
estimatr copied to clipboard
Catalog and discuss S3 methods that lm has
Have added
- [x] confint.lm
- [x] nobs.lm
- [x] predict.lm - fix this to use the right variance
- [x] print.lm
- [x] summary.lm
- [x] tidy.lm
- [x] vcov.lm
Will add
- [ ] anova.lm - may require the residuals or possibly more computation on the residuals
- [ ] formula.lm
- [ ] labels.lm - gets all RHS variable names from formula
- [ ] variable.names.lm (free call to
coefficient_name
) - [ ] weights.lm - we actually already return weights when they are supplied as
predict.lm
needs them
Would be happy to add
- [ ] simulate.lm - Simulates the outcome according to the fits and estimated noise in the model
- The following are all about influence of individual data points, would necessitate keeping the
n
size objects (and maybe adding some computation in some cases where the hat matrix isn't free)- [ ] cooks.distance.lm
- [ ] plot.lm
- [ ] hatvalues.lm
- [ ] influence.lm
- [ ] dfbeta.lm
- [ ] dfbetas.lm
- [ ] rstandard.lm
- [ ] rstudent.lm
- Things that may require the residuals or possibly more computation on the residuals
- [ ] residuals.lm
- [ ] logLik.lm
- [ ] deviance.lm
- [ ] extractAIC.lm, regular AIC is what fultz prefers
- [ ] model.frame.lm - requires keeping this obj
- [ ] model.matrix.lm - requires keeping this obj
Don't really care to add
- [ ] case.names.lm - Requires keeping obj of length
n
, can do if we keep any obj of that size - Things that require the QR object or computation on the QR object
- [ ] qr.lm
- [ ] alias.lm - returns a matrix of linearly dependent terms
- [ ] kappa.lm
- [ ] proj.lm
- [ ] effects.lm
Won't add
- [ ] add1.lm - This and drop are kind of cool; they take your model and add/remove parameters and refit and then run
anova
to compare them. Unnecessary though. - [ ] drop1.lm
- [ ] coerce,oldClass,S3-method - people can just clobber our S3 class if they want
- [ ] family.lm - we only have gaussian w/ identity link
- [ ] dummy.coef.lm - used to get dummies created by model.matrix as one coefficient, but ends up just taking 2nd factor as a dummy and dropping the rest?
- [ ] extract,lm-method - I think this is just used basically as a standin for
$
? - [ ] show,oldClass-method - To allow us to use something besides print to display
- [ ] slotsFromS3,oldClass-method
I had some students run into an issue where they wanted to use the fitted() and residuals() S3 methods and I was surprised to find out that we didn't support them. They wanted to do a fit versus residuals plot.
I assume this is because we don't retain the data required (same reason we don't implement the plot directly).
I helped them hack around it by using predict to extract the fit and residuals, but it seemed like a bit obscure.
Just wanted to surface this as user feedback, no particular necessity to respond to it.
We just added returns for these but not the S3 methods. Have them get the new version and retrieve using $ or [[
I'll add the S3s
It would be nice to have at least residuals()
and logLik()
implemented (I think logLik needs residuals), to compute indices like AIC etc.
Residuals can be computed quite easily, or is there some more complex stuff behind the scenes?
m1 <- iv_robust(mpg ~ gear + cyl | carb + wt, data = mtcars) residuals <- model.frame(m1)[[m1$outcome]] - m1$fitted.values
Thanks @strengejacke ! Correct me if I'm wrong, but the constraint here is that we don't return the data in the iv_robust object, right @lukesonnet ?
It would be nice to have at least
residuals()
andlogLik()
implemented (I think logLik needs residuals), to compute indices like AIC etc.
the log-liklihood only needs the RSS iirc, which is provided (or else RSS/df is reported as a variance, I forget which).
Thanks @strengejacke ! Correct me if I'm wrong, but the constraint here is that we don't return the data in the iv_robust object, right @lukesonnet ?
Not sure, but using the insight package, I can access the data:
library("insight")
library("estimatr")
m1 <- iv_robust(mpg ~ gear + cyl | carb + wt, data = mtcars)
get_data(m1)
#> mpg carb + wt gear cyl carb wt
#> Mazda RX4 21.0 TRUE 4 6 4 2.620
#> Mazda RX4 Wag 21.0 TRUE 4 6 4 2.875
#> Datsun 710 22.8 TRUE 4 4 1 2.320
#> Hornet 4 Drive 21.4 TRUE 3 6 1 3.215
#> Hornet Sportabout 18.7 TRUE 3 8 2 3.440
#> Valiant 18.1 TRUE 3 6 1 3.460
#> Duster 360 14.3 TRUE 3 8 4 3.570
#> Merc 240D 24.4 TRUE 4 4 2 3.190
#> Merc 230 22.8 TRUE 4 4 2 3.150
#> Merc 280 19.2 TRUE 4 6 4 3.440
#> Merc 280C 17.8 TRUE 4 6 4 3.440
#> Merc 450SE 16.4 TRUE 3 8 3 4.070
#> (... truncated)
Created on 2019-05-14 by the reprex package (v0.2.1)
This is some code I'm currently using:
#' @importFrom insight get_response
#' @export
residuals.iv_robust <- function(object, ...) {
insight::get_response(object) - object$fitted.values
}
And then logLik()
should work, code adapted from R-stats:
#' @importFrom stats residuals
#' @export
logLik.iv_robust <- function(object, ...) {
res <- stats::residuals(object)
p <- object$rank
w <- object$weights
N <- length(res)
if (is.null(w)) {
w <- rep.int(1, N)
} else {
excl <- w == 0
if (any(excl)) {
res <- res[!excl]
N <- length(res)
w <- w[!excl]
}
}
N0 <- N
val <- 0.5 * (sum(log(w)) - N * (log(2 * pi) + 1 - log(N) + log(sum(w * res^2))))
attr(val, "nall") <- N0
attr(val, "nobs") <- N
attr(val, "df") <- p + 1
class(val) <- "logLik"
val
}
Thanks for the pointer to insight! might be just the thing.
Necroing this thread, please see also http://discuss.declaredesign.org/t/anova-with-lm-robust-objects/127/2 where I hijacked residuals + deviance generics from lm for someone.
Re-necomancing this thread to add my suggestions from above; it would be very good to have all of the "standard" lm
accessor functions included for compatibility with other packages that use them. For example, car
uses them extensively for hypothesis testing and would be easy to include the lm_robust
objects if it had the accessors.
If I have time later this summer, I might try including it in the package, if there's no one currently working on that issue.
Dropping in to say that it would be very convenient to implement the anova.lm_robust()
method
@alexpghayes I haven't tried, but if drop1
somehow works I think you can recursively call it + combine that with the workaround in the thread I linked before using anova.lmlist
.
From a quick look, anova.lm
won't work because it reads data from the internal data structure rather than using any accessors itself. I've had limited success getting R core to fix those types of issues. In an ideal world we could just inherit from the lm
class and things magically work without reinventing the wheel :|.
@alexpghayes I haven't tried, but if drop1 somehow works I think you can recursively call it + combine that with the workaround in the thread I linked before using anova.lmlist.
Thanks. Will this approach work with multiple outcomes?
From a quick look, anova.lm won't work because it reads data from the internal data structure rather than using any accessors itself. I've had limited success getting R core to fix those types of issues. In an ideal world we could just inherit from the lm class and things magically work without reinventing the wheel :|.
I agree that this is frustrating, and do not condone the current design of stats
, but I do think it is worth briefly pointing out that it doesn't necessarily make statistical sense for anova()
to just call generics, since sampling distributions are specific to individual estimators. Based on my experiences maintaining broom
, I personally think it is good practice to silo estimator-specific code as much as possible in order to maximize statistical correctness.
It might work, but IIRC nobdy has ever asked about mlm
compatibility before or tried.
Also do you need anova
/ manova
multivariate tests or aov
per-outcome tests ?
BTW thought you were UCLA Alex Hayes but I guess not? Anyways, welcome.
if anyone is in need to calculate AIC
myAIC <- function(estimatr_fit) {
2*(estimatr_fit$k+1) + estimatr_fit$nobs * (log(2*pi* (1-estimatr_fit$r.squared)*estimatr_fit$tss/estimatr_fit$nobs ) + 1)
}