estimatr
estimatr copied to clipboard
"std::bad_alloc" error from lm_robust with CR2 errors.
I have run into a replicable "std::bad_alloc" error in fitting a regression when using CR2 standard errors. If I use other standard errors or remove clustering it works no problem. I don't know if there's a numerical stability issue or what. Had to upload the data file zipped for GitHub to make this work so please unzip.
Minimum replicable failing example:
my_df = read_dta("maindata.dta")
reg = lm_robust(guerrilla ~ cofvalue + as.factor(year) + as.factor(coddist), data = my_df, clusters = as.factor(coddist))
Error:
Error in lm_variance(X = if (se_type %in% c("HC2", "HC3", "CR2") && res) cbind(data[["x"]], : std::bad_alloc
Given the std prefix my assumption is this is further down the RCPP rabbit hole.
I can replicate this on my Windows desktop (16GB RAM, nothing else running, don't see R process hitting RAM limits as best as I can tell. Also happens on my MacBook, and another user could replicate it on a loaner Mac and a Windows laptop.
I guess the two issues are:
- What's causing this and does it suggest any broader class of regressions that won't run?
- If it's a resource constraint, probably a more user-informative error would be useful.
Thank you for this report. It was showing up on the Solaris CRAN checks and is a high priority bug. I hope to find time to address it in 2018.
@lukesonnet Have you had an luck? Do you need help?
This seems to be a problem where we are looking to simply allocate way too much memory as CR2 is very demanding. I think it is a resource constraint problem, but I'm not sure how to catch it.
I'm definitely open to solutions.
For what it's worth, the problem is particularly acute when you have many clusters. See the below minimal working example. Obviously this doesn't just affect lm_robust()
but also commarobust()
.
library(estimatr)
# Data w/ many clusters
df_bigc <- data.frame(y = rnorm(300000),
x = rnorm(300000),
cluster = rep(c(1:1000),300))
# Data w/ few clusters
df_smallc <- data.frame(y = rnorm(300000),
x = rnorm(300000),
cluster = rep(c(1:2),150000))
# CR0, few clusters - success!
estimatr::lm_robust(formula = y ~ x,
data = df_smallc,
clusters = cluster,
se_type = "CR0")
# CR0, many clusters - success!
estimatr::lm_robust(formula = y ~ x,
data = df_bigc,
clusters = cluster,
se_type = "CR0")
#CR2, few clusters - typically gives error
estimatr::lm_robust(formula = y ~ x,
data = df_smallc,
clusters = cluster,
se_type = "CR2")
#CR2, many clusters - typically crashes
estimatr::lm_robust(formula = y ~ x,
data = df_bigc,
clusters = cluster,
se_type = "CR2")