estimatr icon indicating copy to clipboard operation
estimatr copied to clipboard

NA coefficients lead to NA F-statistic

Open NickCH-K opened this issue 5 years ago • 4 comments

I imagine this is pretty low on the priority list, but if you have coefficients that are NA, such as by inducing perfect collinearity with weights, the F-statistic is not calculated in either lm_robust or iv_robust, even though lm handles it fine. There are some strange quirks when you play around with this:

library(tidyverse)
library(estimatr)

o <- 300
tb <- tibble(w = rnorm(o),
             group = rep(c('A','B','C'),o/3),
             #note inclusion of 0 weights that line up with group identifier
             weights = rep(c(0,1,2),o/3),
             z = rnorm(o),
             nu = rnorm(o),
             eps = rnorm(o)) %>%
  mutate(x = z*2 + w + nu) %>%
  mutate(y = x*3 + w + eps)

#Successful coefficient and f-statistic calculation
#Interactions in first (iv) and/or second (lm, iv) stages, with collinear term dropped, ok
summary(lm(y~x+factor(group),data=tb))
summary(lm_robust(y~x+factor(group),data=tb,se_type='classical'))
summary(iv_robust(y~x+factor(group)|z+factor(group),data=tb,se_type='classical',diagnostics=TRUE))
#zero weights along with collinear term being dropped, ok.
summary(lm(y~x,data=tb,weights=weights))
summary(lm_robust(y~x,data=tb,weights=weights,se_type='classical',))
summary(iv_robust(y~x|z,data=tb,weights=weights,se_type='classical',diagnostics = TRUE))

#When the collinearity is introduced because of the zero weights
#(in this case, a second dummy should be dropped)
#lm reports NA coefficients/se/etc estimates instead of dropping,  but F is fine
summary(lm(y~x+factor(group),data=tb,weights=weights))
summary(lm(y~x*factor(group),data=tb,weights=weights))

#Under lm_robust, F statistic is fine in the additive version below but not
#the interaction version
summary(lm_robust(y~x+factor(group),data=tb,weights=weights))
summary(lm_robust(y~x*factor(group),data=tb,weights=weights))

#Under IV, additive version produces F for first stage but not second
summary(iv_robust(y~factor(group)+x|z+factor(group),data=tb,weights=weights,se_type='classical',diagnostics=TRUE))
#...but for some reason if it's only a problem in the first, first doesn't work
summary(iv_robust(y~x|z+factor(group),data=tb,weights=weights,se_type='classical',diagnostics=TRUE))

#and under IV, interaction breaks both first and second stage
summary(iv_robust(y~x*factor(group)|z*factor(group),data=tb,weights=weights,se_type='classical',diagnostics=TRUE))
#if it's only a problem in the first stage, second stage is fine
summary(iv_robust(y~x|z*factor(group),data=tb,weights=weights,se_type='classical',diagnostics=TRUE))

NickCH-K avatar Jun 27 '19 23:06 NickCH-K

Thanks, the f-test code isn't my finest hour so this is a good reason to revisit it.

lukesonnet avatar Jul 10 '19 22:07 lukesonnet

Checked in estimatr 0.20.0, bug is still there (as I'd expect given the issue is open, but figured I'd check).

NickCH-K avatar Sep 10 '19 23:09 NickCH-K

Sorry Nick, last patch was some major problems facing all users. We're triaging now.

lukesonnet avatar Sep 12 '19 16:09 lukesonnet

Oh no problem. Just thought I'd check since the new version was out.

NickCH-K avatar Sep 12 '19 16:09 NickCH-K