FixedEffectModels.jl Bug for residuals

trafficstars

It seems that if you don't specify the intercept as a fixed effect, residuals can't be retrived. The following is a MWE:

df = DataFrame(a = rand(100), b = rand(100), c = rand(100), d = ones(100))

ols = FixedEffectModels.reg(df, @formula(a ~ b + c), save = true)
residuals(ols) # this gives error

ols = FixedEffectModels.reg(df, @formula(a ~ b + c + fe(d)), save = true)
residuals(ols) #this works fine

Maybe it's worth making this remark on the documentation, rather than fixing it? After all, it's a package for the inclusion of fixed effects. But sometimes you want to compare results with a simple OLS.

Jun 29 '22 06:06 alfaromartino

I just also noticed that adding an intercept in this way crashes predict if there are missing values (it works fine if there are no missing values). The MWE is:

df = DataFrame(a = rand(100), b = rand(100), c = rand(100), d = ones(100))
allowmissing!(df)
df.b[[30,40]] .= missing

ols = FixedEffectModels.reg(df, @formula(a ~ b + c + fe(d)), save = true)
residuals(ols) #this works fine
predict(ols,df) #this doesn't work

ols = FixedEffectModels.reg(df, @formula(a ~ b + c), save = true)
residuals(ols) #this gives error 
predict(ols,df) #this works fine

Jun 29 '22 07:06 alfaromartino

I have just fixed the first issue. @nilshg: could you have a look at the second one?

Jul 18 '22 23:07 matthieugomez

So what happens here is that with missing we are getting two predicted fixed effects for the one group included in the model:

julia> unique(ols.fe)
2×2 DataFrame
 Row │ d         fe_d
     │ Float64?  Float64?
─────┼──────────────────────────
   1 │      1.0        0.608499
   2 │      1.0  missing

Which then means when leftjoining fixed effects onto the original data to be able to add them to the predicted response, the data set is duplicated - every row with a fixed effect value of 1 will turn into one row with a fixed effect ot 0.6, and one with a missing fixed effect. We then get an error as the nonmissings vector used to pick the rows without missing data is only half as long as the data set we're indexing into following duplication.

I'm a little surprised that we end up with a missing fixed effect here, as I would have expected rows with missing observations to get dropped. The immediate issue would be solved by by replacing

fes = leftjoin(select(df, m.fekeys), unique(m.fe); on = m.fekeys, makeunique = true, matchmissing = :equal)

with

fes = leftjoin(select(df, m.fekeys), dropmissing(unique(m.fe)); on = m.fekeys, makeunique = true, matchmissing = :equal)

but it feels like a fix someplace before we reach predict might be more appropriate?

Jul 19 '22 14:07 nilshg

FixedEffectModels.jl FixedEffectModels.jl copied to clipboard

Bug for residuals

FixedEffectModels.jl
FixedEffectModels.jl copied to clipboard