FixedEffectModels.jl
FixedEffectModels.jl copied to clipboard
Differences in R2 and tss compared to GLM
Hello,
Thank you for writing this package I wanted to use it as a go faster drop in replacement for the GLM.jl package. As a test I tried fitting the same model using both GLM and FixedEffectsModels and noticed differences in the R^2 reported. I have tried to replicate the difference in a minimum working example which I have attached. I have only been able to show a small difference which comes down to the following: @formula(obs~0+year+treatment)
vs @formula(obs~0+year+fe(treatment))
. It may be that my understanding of the the statistical model difference between the two formula is incomplete in which case a faq or a note in the documentation may help someone in the future. Here is my small working example which I hope might be the basis for a test.
using CUDA
using FixedEffectModels, GLM
using DataFrames, CategoricalArrays
using Test
@testset "Compare FE to GLM" begin
treatmentD = ["A","A","A","B","B","B","B"]
yearD = [0,1,2, 0,1,2,3]
testframe=DataFrame(treatment = treatmentD, year = yearD,
obs = rand(length(treatmentD)) )
testframe[!,:treatment]=categorical(testframe[:,:treatment])
ols1=fit(LinearModel,@formula(obs~0+year+treatment),testframe)
fesol=reg(testframe,@formula(obs~0+year+fe(treatment)),save=true)
@test isapprox(r2(fesol),r2(ols1))
@test isapprox(adjr2(fesol),adjr2(ols1))
@test isapprox(coef(fesol)[1],coef(ols1)[1])
@test isapprox(residuals(fesol),residuals(ols1))
nofesol=reg(testframe,@formula(obs~0+year+treatment),save=true)
@test isapprox(r2(nofesol),r2(ols1))
@test isapprox(adjr2(nofesol),adjr2(ols1))
@test isapprox(coef(nofesol),coef(ols1))
@test isapprox(residuals(nofesol,testframe),residuals(ols1))
@test isapprox(nofesol.tss, fesol.tss) #this fails
@test isapprox(fesol.tss, nulldeviance(ols1))
@test isapprox(nofesol.rss, fesol.rss)
@test isapprox(nofesol.rss, deviance(ols1))
nofesolgpu=reg(testframe,@formula(obs~0+year+treatment),method=:gpu, double_precision=true,save=true)
@test isapprox(r2(nofesolgpu),r2(ols1))
@test isapprox(adjr2(nofesolgpu),adjr2(ols1))
fesolgpu=reg(testframe,@formula(obs~0+year+fe(treatment)),method=:gpu, double_precision=true,save=true)
@test isapprox(r2(fesolgpu),r2(ols1))
@test isapprox(adjr2(fesolgpu),adjr2(ols1))
@test isapprox(r2(fesolgpu),r2(fesol))
@test isapprox(adjr2(fesolgpu),adjr2(fesol))
@test isapprox(coef(fesolgpu),coef(fesol))
@test isapprox(r2(nofesolgpu),r2(nofesol))
@test isapprox(adjr2(nofesolgpu),adjr2(nofesol))
@test isapprox(coef(fesolgpu),coef(fesol))
end
I am not sure this is an error. The definition of r2 is ambiguous in the absence of an intercept FixedEffectModels reports the same thing as Stata.
See also: https://github.com/FixedEffects/FixedEffectModels.jl/issues/173