fixest icon indicating copy to clipboard operation
fixest copied to clipboard

Feature request: VCV for multiple outcomes

Open jonathandroth opened this issue 1 year ago • 8 comments

Laurent -- thanks again for the fantastic package!

I think it would be great if fixest could return the joint variance-covariance matrix when estimating multiple regressions at once. I.e., I run

feols(c(y1,y2) ~ x,...)

Now I'd like to know the covariance between the coefficient on x when using y1 as an outcome and y2 as an outcome. (This is what's sometime called seeming unrelated regression.) There are some wrap-arounds available (e.g. here), but it would be great if this could easily be done natively.

jonathandroth avatar Jun 23 '23 17:06 jonathandroth

@jonathandroth, here's a basic version (no clustering). Two questions and I can make this more robust:

  1. What is the correct way to account for degrees of freedom? Is it sum of number of rows in each est - sum of number of covariates in each est?
  2. Would clustered standard errors be useful? I believe you would do sum $X_i' e_i$ for each g and premultiply by $(X'X)^{-1}$ to get a $N_g$ by $k$ matrix of influence functions (instead of N by k).
library(fixest)

#' Get joint variance-covariance matrix of seemingly unrelated regressions
#' 
#' @param ests List of `feols` estimates.
#' 
#' @return Variance-covariance matrix of the combined set of coefficients in
#' all models in `ests`.
#' 
vcovSUR <- function(ests) {
  if (inherits(ests, "fixest") | inherits(ests, "fixest_multi")) { 
    ests = list(ests)
  } 
  
  inf_list <- lapply(ests, get_influence_func)
  inf_funcs <- do.call("rbind", inf_list)
  
  vcov = tcrossprod(inf_funcs)
  return(vcov)
}

get_influence_func <- function(est) {
  # (X'X)^{-1} X'e
  if (inherits(est, "fixest_multi")) {
    
    inf_list = lapply(est, function(x) { 
      (x$cov.iid / x$sigma2) %*% t(x$scores)
    })
    do.call("rbind", inf_list)

  } else if (inherits(est, "fixest")) {
    
    (est$cov.iid / est$sigma2) %*% t(est$scores)
  
  } else { # code taken partially from `sandwich`
  
    X <- stats::model.matrix(est)
    if (any(alias <- is.na(stats::coef(est)))) X <- X[, !alias, drop = FALSE]
    if (!is.null(w <- stats::weights(est))) X <- X * sqrt(w)
    scores = X * stats::resid(est)
    solve(crossprod(X), t(scores))

  }
}

mtcars$w = runif(nrow(mtcars), 0.15, 0.85)
est1 <- feols(mpg ~ wt, mtcars)
est2 <- feols(mpg ~ wt + i(cyl), mtcars)
est3 <- lm(mpg ~ wt + i(cyl), mtcars)
est_w <- feols(mpg ~ wt + i(cyl), mtcars, weights = ~w)
est_multi <- feols(c(mpg, hp)  ~ wt + i(cyl), mtcars)

vcov(est1, "hc1", ssc = ssc(adj = FALSE, cluster.adj = FALSE))
#>             (Intercept)        wt
#> (Intercept)    4.516833 -1.305717
#> wt            -1.305717  0.401577
vcov(est2, "hc1", ssc = ssc(adj = FALSE, cluster.adj = FALSE))
#>             (Intercept)         wt     cyl::6     cyl::8
#> (Intercept)   3.2700821 -0.9321293 -0.3715946  0.3477177
#> wt           -0.9321293  0.3783737 -0.2442125 -0.5329499
#> cyl::6       -0.3715946 -0.2442125  1.2716154  1.3082462
#> cyl::8        0.3477177 -0.5329499  1.3082462  1.9913114
vcovSUR(list(est1, est2))
#>             (Intercept)          wt (Intercept)         wt     cyl::6
#> (Intercept)   4.5168333 -1.30571703   3.3911740 -0.8658345 -0.6364905
#> wt           -1.3057170  0.40157696  -0.9446715  0.2607989  0.1249679
#> (Intercept)   3.3911740 -0.94467154   3.2700821 -0.9321293 -0.3715946
#> wt           -0.8658345  0.26079887  -0.9321293  0.3783737 -0.2442125
#> cyl::6       -0.6364905  0.12496790  -0.3715946 -0.2442125  1.2716154
#> cyl::8       -0.2795933  0.06152524   0.3477177 -0.5329499  1.3082462
#>                  cyl::8
#> (Intercept) -0.27959331
#> wt           0.06152524
#> (Intercept)  0.34771768
#> wt          -0.53294991
#> cyl::6       1.30824624
#> cyl::8       1.99131136

# Works with weights
vcov(est_w, "hc1", ssc = ssc(adj = FALSE, cluster.adj = FALSE))
#>             (Intercept)          wt      cyl::6     cyl::8
#> (Intercept)   5.0756301 -1.12152509 -1.59657428 -0.4062194
#> wt           -1.1215251  0.33655783  0.07906278 -0.2880801
#> cyl::6       -1.5965743  0.07906278  1.47597744  1.2937823
#> cyl::8       -0.4062194 -0.28808008  1.29378231  1.9053256
vcovSUR(est_w)
#>             (Intercept)          wt      cyl::6     cyl::8
#> (Intercept)   5.0756301 -1.12152509 -1.59657428 -0.4062194
#> wt           -1.1215251  0.33655783  0.07906278 -0.2880801
#> cyl::6       -1.5965743  0.07906278  1.47597744  1.2937823
#> cyl::8       -0.4062194 -0.28808008  1.29378231  1.9053256

# Works with multi
lapply(est_multi, function(est) vcov(est, "hc1", ssc = ssc(adj = FALSE, cluster.adj = FALSE)))
#> $`lhs: mpg`
#>             (Intercept)         wt     cyl::6     cyl::8
#> (Intercept)   3.2700821 -0.9321293 -0.3715946  0.3477177
#> wt           -0.9321293  0.3783737 -0.2442125 -0.5329499
#> cyl::6       -0.3715946 -0.2442125  1.2716154  1.3082462
#> cyl::8        0.3477177 -0.5329499  1.3082462  1.9913114
#> 
#> $`lhs: hp`
#>             (Intercept)         wt    cyl::6    cyl::8
#> (Intercept)    481.0195 -189.14317 135.81828  479.2033
#> wt            -189.1432   80.30526 -73.10224 -221.1762
#> cyl::6         135.8183  -73.10224 174.73028  230.6635
#> cyl::8         479.2033 -221.17619 230.66346  730.4108
vcovSUR(est_multi)
#>             (Intercept)         wt     cyl::6      cyl::8  (Intercept)
#> (Intercept)   3.2700821 -0.9321293 -0.3715946   0.3477177   -3.6102659
#> wt           -0.9321293  0.3783737 -0.2442125  -0.5329499    0.2846401
#> cyl::6       -0.3715946 -0.2442125  1.2716154   1.3082462    2.1105066
#> cyl::8        0.3477177 -0.5329499  1.3082462   1.9913114   -1.6623714
#> (Intercept)  -3.6102659  0.2846401  2.1105066  -1.6623714  481.0194872
#> wt            0.2846401 -0.2300793  0.7005150   2.4442424 -189.1431698
#> cyl::6        2.1105066  0.7005150 -6.1116792  -6.4158361  135.8182788
#> cyl::8       -1.6623714  2.4442424 -6.4158361 -13.5626121  479.2033017
#>                       wt     cyl::6      cyl::8
#> (Intercept)    0.2846401   2.110507   -1.662371
#> wt            -0.2300793   0.700515    2.444242
#> cyl::6         0.7005150  -6.111679   -6.415836
#> cyl::8         2.4442424  -6.415836  -13.562612
#> (Intercept) -189.1431698 135.818279  479.203302
#> wt            80.3052563 -73.102243 -221.176193
#> cyl::6       -73.1022434 174.730281  230.663464
#> cyl::8      -221.1761929 230.663464  730.410817

Created on 2023-07-10 with reprex v2.0.2

kylebutts avatar Jul 10 '23 16:07 kylebutts

@grantmcdermott @vincentarelbundock, is there a way to call marginaleffects::hypothesis providing a vector of coefficients and the variance-covariance matrix?

kylebutts avatar Jul 11 '23 01:07 kylebutts

@kylebutts maybe the FUN argument in hypotheses() can help. (Not sure.)

vincentarelbundock avatar Jul 11 '23 06:07 vincentarelbundock

Thanks, @kylebutts !

Re d.o.f., Stata's sureg command has the following options:

image

Yes, I do think clustered SEs would be a nice add-on.

jonathandroth avatar Jul 11 '23 13:07 jonathandroth

@vincentarelbundock That worked with defining custom get_coef and set_coef methods! Thanks!

kylebutts avatar Jul 12 '23 14:07 kylebutts

Cool! Is that too niche, or is it something that should be merged in the package, you think?

vincentarelbundock avatar Jul 12 '23 15:07 vincentarelbundock

@jonathandroth See https://github.com/kylebutts/vcovSUR :-)

✔️ Variance-covariance matrix estimation ✔️ Clustering (which I realized is necessary for clustering by observation id) ✔️ Support for fixest_multi objects ✔️ Hypothesis Tests from {marginaleffects} ✖︎ No small-sample degree of freedom correction (yet).

kylebutts avatar Jul 12 '23 15:07 kylebutts

Awesome! My own 2 cents is it would be useful to have a native fixest feature for doing hypothesis tests when using multiple outcomes, but I'll leave it up to the package development experts here. Either way this will be a great resource

On Wed, Jul 12, 2023 at 11:18 AM Kyle F Butts @.***> wrote:

@jonathandroth https://github.com/jonathandroth See https://github.com/kylebutts/vcovSUR :-)

✔️ Variance-covariance matrix estimation ✔️ Clustering (which I realized is necessary for clustering by observation id) ✔️ Support for fixest_multi objects ✔️ Hypothesis Tests from {marginaleffects} ✖︎ No small-sample degree of freedom correction (yet).

— Reply to this email directly, view it on GitHub https://github.com/lrberge/fixest/issues/424#issuecomment-1632736446, or unsubscribe https://github.com/notifications/unsubscribe-auth/AE6EXFCGYE76O27U4ZRVGWDXP252VANCNFSM6AAAAAAZR2EXXE . You are receiving this because you were mentioned.Message ID: @.***>

jonathandroth avatar Jul 14 '23 13:07 jonathandroth