FixedEffectModels.jl icon indicating copy to clipboard operation
FixedEffectModels.jl copied to clipboard

Benchmark against Python's pyhdfe

Open azev77 opened this issue 4 years ago • 13 comments

In addition to STATA reghdfe & R felm. pyhdfe https://github.com/jeffgortmaker/pyhdfe

azev77 avatar Feb 20 '20 03:02 azev77

Happy to do it if you can send us a code snippet that reproduces the benchmark regressions.

eloualiche avatar Feb 24 '20 19:02 eloualiche

Also maybe the new(ish) R fixest package: https://github.com/lrberge/fixest/wiki

grantmcdermott avatar Mar 06 '20 18:03 grantmcdermott

The R package fixest seems to dominate other alternatives. Not sure if using GPU will change the result.

waynelapierre avatar Dec 11 '20 09:12 waynelapierre

@azev77, I think linearmodels is the best Python alternative to FixedEffects.jl in terms of feature richness. econtools is also useful, but is more comparable to Stata's areg in terms of functionality (i.e., two-way clustered standard errors and multiple absorption factors are not available).

@eloualiche, I've included below a reproduction of the benchmark example for Python using linearmodels. I'm not 100% sure about the comparability of @time in Julia and time in Python.

Also, note that R and Stata examples in the benchmark example did not square cos(id2) but the Julia one does. I followed the Julia benchmark version.

@bashtage may have thoughts on whether the code below best represents linearmodels.

import numpy as np
import pandas as pd
import linearmodels.panel as lmp
import linearmodels.iv as lmiv
import statsmodels.api as sm
import time

N = 10_000_000
K = 100

df = pd.DataFrame(
    {
        "id1": np.random.choice(np.arange(1, int(N / K) + 1), size=N),
        "id2": np.random.choice(np.arange(1, K + 1), size=N),
        "x1": np.random.uniform(size=N),
        "x2": np.random.uniform(size=N),
    }
)

df["y"] = (
    3 * df["x1"]
    + 2 * df["x2"]
    + np.sin(df["id1"])
    + np.cos(df["id2"]) ** 2
    + np.random.uniform(size=N)
)

df["id1"] = pd.Categorical(df["id1"])
df["id2"] = pd.Categorical(df["id2"])

# reg y x1 x2
dep = df["y"]
exog = sm.add_constant(df[["x1", "x2"]])

start = time.time()
lmiv.IV2SLS(dep, exog, None, None).fit()
end = time.time()
print(end - start)

# areg y x1 x2, a(id1)
absorb = pd.DataFrame(df["id1"])

start = time.time()
lmiv.AbsorbingLS(dep, exog, absorb=absorb).fit()
end = time.time()
print(end - start)

# reghdfe y x1 x2, a(id1 id2)
absorb = df[["id1", "id2"]]

start = time.time()
lmiv.AbsorbingLS(dep, exog, absorb=absorb).fit()
end = time.time()
print(end - start)

# reg y x1 x2, cl(id1)
clusters = df["id1"]

start = time.time()
lmiv.IV2SLS(dep, exog, None, None).fit(
    cov_type="clustered", clusters=clusters
)
end = time.time()
print(end - start)

# ivreg2 y x1 x2, cluster(id1 id2)
clusters = df[["id1", "id2"]]

start = time.time()
lmiv.IV2SLS(dep, exog, None, None).fit(
    cov_type="clustered", clusters=clusters
)
end = time.time()
print(end - start)

vikjam avatar Dec 27 '20 18:12 vikjam

https://github.com/gzervas/torchfe

azev77 avatar Dec 31 '20 03:12 azev77

https://github.com/gzervas/torchfe

Cool, I’ll check this out too!

vikjam avatar Dec 31 '20 04:12 vikjam

@vikjam This seems about right. statsmodels also supports OLS with 2-way clustering.

bashtage avatar Dec 31 '20 13:12 bashtage

I got around to it. Here are the benchmarks (in seconds) on my Macbook: image

a few notes:

  1. fixest's feols() is much faster than lfe's felm()
  2. it is well known that Julia functions should not be benchmarked w/ @time. It is recommended to use more robust methods such as @btime or @benchmark from BenchmarkTools.jl
  3. I don't know the appropriate method in R besides system.time()

Julia code:

using DataFrames, FixedEffectModels, BenchmarkTools
N = 10000000
K = 100
id1 = rand(1:(N/K), N)
id2 = rand(1:K, N)
x1 =  randn(N)
x2 =  randn(N)
y= 3 .* x1 .+ 2 .* x2 .+ sin.(id1) .+ cos.(id2).^2 .+ randn(N)
df = DataFrame(id1 = categorical(id1), id2 = categorical(id2), x1 = x1, x2 = x2, y = y)
@btime reg(df, @formula(y ~ x1 + x2))
@btime reg(df, @formula(y ~ x1 + x2 + fe(id1)))
@btime reg(df, @formula(y ~ x1 + x2 + fe(id1) + fe(id2)))

R code:

N = 10000000
K = 100
df = data.frame(
  id1 =  as.factor(sample(N/K, N, replace = TRUE)),
  id2 =  as.factor(sample(K, N, replace = TRUE)),
  x1 =  runif(N),
  x2 =  runif(N)
)
df[, "y"] =  3 * df[, "x1"] + 2 * df[, "x2"] + sin(as.numeric(df[, "id1"])) + cos(as.numeric(df[, "id2"])) + runif(N)

library(lfe)
system.time(felm(y ~ x1 + x2, df))
system.time(felm(y ~ x1 + x2|id1, df))
system.time(felm(y ~ x1 + x2|id1 + id2, df))

#fixest estimates Clustered SE w/ a separate command.
library(fixest)
system.time(feols(y ~ x1 + x2, df))
system.time(feols(y ~ x1 + x2|id1, df))
system.time(feols(y ~ x1 + x2|id1 + id2, df))

azev77 avatar Feb 03 '21 22:02 azev77

Great benchmark @azev77!

By the way, there's an issue in fixest for MAC users that prevents multithreading and hence reduces performance substantially (see https://github.com/lrberge/fixest/issues/63). By any chance, are you running fixest with multithreaded mode on (you can check with getFixest_nthreads())?

lrberge avatar Mar 02 '21 13:03 lrberge

@lrberge not sure. I just reran on Windows computer at work. Looks like: lfe.R is slower than FixedEffectModels.jl which is slower than fixest.R

@eloualiche @matthieugomez this feels a bit suspect: @s-broda's ARCHModels.jl

We find that ARCHModels.jl is faster than Python’s ARCH package by a factor of about 6, faster than Matlab’s Econometrics Toolbox by a factor of about 12, and faster than R’s rugarch package by a factor of about 50. This is despite the fact that all packages (except ARCHModels.jl) implement the likelihood function in a compiled low-level language.

Are there missed opportunities to get better performance in FixedEffectModels.jl?

R

> library(lfe)
> system.time(felm(y ~ x1 + x2, df))
   user  system elapsed 
   2.04    0.42    2.47 
> system.time(felm(y ~ x1 + x2|id1, df))
   user  system elapsed 
  27.38    0.67   27.86 
> system.time(felm(y ~ x1 + x2|id1 + id2, df))
   user  system elapsed 
  14.74    0.91   13.88 
> #fixest estimates Clustered SE w/ a separate command.
> library(fixest)
> system.time(feols(y ~ x1 + x2, df))
   user  system elapsed 
   0.64    0.08    0.56 
> system.time(feols(y ~ x1 + x2|id1, df))
   user  system elapsed 
   2.57    0.29    2.56 
> system.time(feols(y ~ x1 + x2|id1 + id2, df))
   user  system elapsed 
   5.20    0.43    3.61 

Julia

julia> @btime reg(df, @formula(y ~ x1 + x2))
  918.930 ms (559 allocations: 690.27 MiB)
julia> @btime reg(df, @formula(y ~ x1 + x2 + fe(id1)))
  3.888 s (1371 allocations: 891.50 MiB)
julia> @btime reg(df, @formula(y ~ x1 + x2 + fe(id1) + fe(id2)))
  9.126 s (3213 allocations: 1006.07 MiB)

azev77 avatar Mar 03 '21 02:03 azev77

I think get similar timings for FixedEffectModels and feols on a mac with ~~only one thread~~ four threads after using drop_singletons = false and tol = 1e-6 (which correspond to the default values of feols).

N = 10000000
K = 100
df = data.frame(
  id1 =  as.factor(sample(N/K, N, replace = TRUE)),
  id2 =  as.factor(sample(K, N, replace = TRUE)),
  x1 =  runif(N),
  x2 =  runif(N)
)
df[, "y"] =  3 * df[, "x1"] + 2 * df[, "x2"] + sin(as.numeric(df[, "id1"])) + cos(as.numeric(df[, "id2"])) + runif(N)
library(fixest)
system.time(feols(y ~ x1 + x2, df))
#   user  system elapsed 
#  0.870   0.017   0.456
system.time(feols(y ~ x1 + x2|id1, df))
#   user  system elapsed 
#  2.900   0.089   1.931 
system.time(feols(y ~ x1 + x2|id1 + id2, df))
#   user  system elapsed 
#  4.380   0.137   2.802 
using DataFrames, FixedEffectModels
N = 10000000
K = 100
id1 = rand(1:div(N, K), N)
id2 = rand(1:K, N)
x1 =  randn(N)
x2 =  randn(N)
y= 3 .* x1 .+ 2 .* x2 .+ sin.(id1) .+ cos.(id2).^2 .+ randn(N)
df = DataFrame(id1 = categorical(id1), id2 = categorical(id2), x1 = x1, x2 = x2, y = y)
@time reg(df, @formula(y ~ x1 + x2))
#  0.587668 seconds (554 allocations: 690.274 MiB, 2.13% gc time)
@time reg(df, @formula(y ~ x1 + x2 + fe(id1)), drop_singletons = false)
#     1.003298 seconds (230.64 k allocations: 903.669 MiB, 4.33% gc time, 13.68% compilation time)
@time reg(df, @formula(y ~ x1 + x2 + fe(id1) + fe(id2)), drop_singletons = false, tol = 1e-6)
#  2.066189 seconds (330.34 k allocations: 1.000 GiB, 2.58% gc time, 9.00% compilation time)

matthieugomez avatar Apr 11 '21 04:04 matthieugomez

There was a discussion about this on Twitter w/ @floswald. I’m think it’s BC Julia doesn’t do multithreaded broadcasting yet...

azev77 avatar Apr 11 '21 05:04 azev77

I have pushed a PR for feols to change the default values of tol and drop_singletons https://github.com/lrberge/fixest/pull/128

matthieugomez avatar Apr 12 '21 22:04 matthieugomez