FixedEffectModels.jl
FixedEffectModels.jl copied to clipboard
Benchmark against Python's pyhdfe
In addition to STATA reghdfe
& R felm
.
pyhdfe
https://github.com/jeffgortmaker/pyhdfe
Happy to do it if you can send us a code snippet that reproduces the benchmark regressions.
Also maybe the new(ish) R fixest package: https://github.com/lrberge/fixest/wiki
The R package fixest seems to dominate other alternatives. Not sure if using GPU will change the result.
@azev77, I think linearmodels
is the best Python alternative to FixedEffects.jl
in terms of feature richness. econtools
is also useful, but is more comparable to Stata's areg
in terms of functionality (i.e., two-way clustered standard errors and multiple absorption factors are not available).
@eloualiche, I've included below a reproduction of the benchmark example for Python using linearmodels
. I'm not 100% sure about the comparability of @time
in Julia and time
in Python.
Also, note that R and Stata examples in the benchmark example did not square cos(id2)
but the Julia one does. I followed the Julia benchmark version.
@bashtage may have thoughts on whether the code below best represents linearmodels
.
import numpy as np
import pandas as pd
import linearmodels.panel as lmp
import linearmodels.iv as lmiv
import statsmodels.api as sm
import time
N = 10_000_000
K = 100
df = pd.DataFrame(
{
"id1": np.random.choice(np.arange(1, int(N / K) + 1), size=N),
"id2": np.random.choice(np.arange(1, K + 1), size=N),
"x1": np.random.uniform(size=N),
"x2": np.random.uniform(size=N),
}
)
df["y"] = (
3 * df["x1"]
+ 2 * df["x2"]
+ np.sin(df["id1"])
+ np.cos(df["id2"]) ** 2
+ np.random.uniform(size=N)
)
df["id1"] = pd.Categorical(df["id1"])
df["id2"] = pd.Categorical(df["id2"])
# reg y x1 x2
dep = df["y"]
exog = sm.add_constant(df[["x1", "x2"]])
start = time.time()
lmiv.IV2SLS(dep, exog, None, None).fit()
end = time.time()
print(end - start)
# areg y x1 x2, a(id1)
absorb = pd.DataFrame(df["id1"])
start = time.time()
lmiv.AbsorbingLS(dep, exog, absorb=absorb).fit()
end = time.time()
print(end - start)
# reghdfe y x1 x2, a(id1 id2)
absorb = df[["id1", "id2"]]
start = time.time()
lmiv.AbsorbingLS(dep, exog, absorb=absorb).fit()
end = time.time()
print(end - start)
# reg y x1 x2, cl(id1)
clusters = df["id1"]
start = time.time()
lmiv.IV2SLS(dep, exog, None, None).fit(
cov_type="clustered", clusters=clusters
)
end = time.time()
print(end - start)
# ivreg2 y x1 x2, cluster(id1 id2)
clusters = df[["id1", "id2"]]
start = time.time()
lmiv.IV2SLS(dep, exog, None, None).fit(
cov_type="clustered", clusters=clusters
)
end = time.time()
print(end - start)
https://github.com/gzervas/torchfe
https://github.com/gzervas/torchfe
Cool, I’ll check this out too!
@vikjam This seems about right. statsmodels also supports OLS with 2-way clustering.
I got around to it. Here are the benchmarks (in seconds) on my Macbook:
a few notes:
-
fixest's
feols()
is much faster than lfe'sfelm()
- it is well known that Julia functions should not be benchmarked w/
@time
. It is recommended to use more robust methods such as@btime
or@benchmark
from BenchmarkTools.jl - I don't know the appropriate method in R besides
system.time()
Julia code:
using DataFrames, FixedEffectModels, BenchmarkTools
N = 10000000
K = 100
id1 = rand(1:(N/K), N)
id2 = rand(1:K, N)
x1 = randn(N)
x2 = randn(N)
y= 3 .* x1 .+ 2 .* x2 .+ sin.(id1) .+ cos.(id2).^2 .+ randn(N)
df = DataFrame(id1 = categorical(id1), id2 = categorical(id2), x1 = x1, x2 = x2, y = y)
@btime reg(df, @formula(y ~ x1 + x2))
@btime reg(df, @formula(y ~ x1 + x2 + fe(id1)))
@btime reg(df, @formula(y ~ x1 + x2 + fe(id1) + fe(id2)))
R code:
N = 10000000
K = 100
df = data.frame(
id1 = as.factor(sample(N/K, N, replace = TRUE)),
id2 = as.factor(sample(K, N, replace = TRUE)),
x1 = runif(N),
x2 = runif(N)
)
df[, "y"] = 3 * df[, "x1"] + 2 * df[, "x2"] + sin(as.numeric(df[, "id1"])) + cos(as.numeric(df[, "id2"])) + runif(N)
library(lfe)
system.time(felm(y ~ x1 + x2, df))
system.time(felm(y ~ x1 + x2|id1, df))
system.time(felm(y ~ x1 + x2|id1 + id2, df))
#fixest estimates Clustered SE w/ a separate command.
library(fixest)
system.time(feols(y ~ x1 + x2, df))
system.time(feols(y ~ x1 + x2|id1, df))
system.time(feols(y ~ x1 + x2|id1 + id2, df))
Great benchmark @azev77!
By the way, there's an issue in fixest
for MAC users that prevents multithreading and hence reduces performance substantially (see https://github.com/lrberge/fixest/issues/63).
By any chance, are you running fixest
with multithreaded mode on (you can check with getFixest_nthreads()
)?
@lrberge not sure. I just reran on Windows computer at work.
Looks like: lfe.R
is slower than FixedEffectModels.jl
which is slower than fixest.R
@eloualiche @matthieugomez this feels a bit suspect: @s-broda's ARCHModels.jl
We find that ARCHModels.jl is faster than Python’s ARCH package by a factor of about 6, faster than Matlab’s Econometrics Toolbox by a factor of about 12, and faster than R’s rugarch package by a factor of about 50. This is despite the fact that all packages (except ARCHModels.jl) implement the likelihood function in a compiled low-level language.
Are there missed opportunities to get better performance in FixedEffectModels.jl
?
R
> library(lfe)
> system.time(felm(y ~ x1 + x2, df))
user system elapsed
2.04 0.42 2.47
> system.time(felm(y ~ x1 + x2|id1, df))
user system elapsed
27.38 0.67 27.86
> system.time(felm(y ~ x1 + x2|id1 + id2, df))
user system elapsed
14.74 0.91 13.88
> #fixest estimates Clustered SE w/ a separate command.
> library(fixest)
> system.time(feols(y ~ x1 + x2, df))
user system elapsed
0.64 0.08 0.56
> system.time(feols(y ~ x1 + x2|id1, df))
user system elapsed
2.57 0.29 2.56
> system.time(feols(y ~ x1 + x2|id1 + id2, df))
user system elapsed
5.20 0.43 3.61
Julia
julia> @btime reg(df, @formula(y ~ x1 + x2))
918.930 ms (559 allocations: 690.27 MiB)
julia> @btime reg(df, @formula(y ~ x1 + x2 + fe(id1)))
3.888 s (1371 allocations: 891.50 MiB)
julia> @btime reg(df, @formula(y ~ x1 + x2 + fe(id1) + fe(id2)))
9.126 s (3213 allocations: 1006.07 MiB)
I think get similar timings for FixedEffectModels and feols
on a mac with ~~only one thread~~ four threads after using drop_singletons = false
and tol = 1e-6
(which correspond to the default values of feols
).
N = 10000000
K = 100
df = data.frame(
id1 = as.factor(sample(N/K, N, replace = TRUE)),
id2 = as.factor(sample(K, N, replace = TRUE)),
x1 = runif(N),
x2 = runif(N)
)
df[, "y"] = 3 * df[, "x1"] + 2 * df[, "x2"] + sin(as.numeric(df[, "id1"])) + cos(as.numeric(df[, "id2"])) + runif(N)
library(fixest)
system.time(feols(y ~ x1 + x2, df))
# user system elapsed
# 0.870 0.017 0.456
system.time(feols(y ~ x1 + x2|id1, df))
# user system elapsed
# 2.900 0.089 1.931
system.time(feols(y ~ x1 + x2|id1 + id2, df))
# user system elapsed
# 4.380 0.137 2.802
using DataFrames, FixedEffectModels
N = 10000000
K = 100
id1 = rand(1:div(N, K), N)
id2 = rand(1:K, N)
x1 = randn(N)
x2 = randn(N)
y= 3 .* x1 .+ 2 .* x2 .+ sin.(id1) .+ cos.(id2).^2 .+ randn(N)
df = DataFrame(id1 = categorical(id1), id2 = categorical(id2), x1 = x1, x2 = x2, y = y)
@time reg(df, @formula(y ~ x1 + x2))
# 0.587668 seconds (554 allocations: 690.274 MiB, 2.13% gc time)
@time reg(df, @formula(y ~ x1 + x2 + fe(id1)), drop_singletons = false)
# 1.003298 seconds (230.64 k allocations: 903.669 MiB, 4.33% gc time, 13.68% compilation time)
@time reg(df, @formula(y ~ x1 + x2 + fe(id1) + fe(id2)), drop_singletons = false, tol = 1e-6)
# 2.066189 seconds (330.34 k allocations: 1.000 GiB, 2.58% gc time, 9.00% compilation time)
There was a discussion about this on Twitter w/ @floswald. I’m think it’s BC Julia doesn’t do multithreaded broadcasting yet...
I have pushed a PR for feols
to change the default values of tol
and drop_singletons
https://github.com/lrberge/fixest/pull/128