FixedEffectModels.jl
FixedEffectModels.jl copied to clipboard
Bug for residuals
It seems that if you don't specify the intercept as a fixed effect, residuals can't be retrived. The following is a MWE:
df = DataFrame(a = rand(100), b = rand(100), c = rand(100), d = ones(100))
ols = FixedEffectModels.reg(df, @formula(a ~ b + c), save = true)
residuals(ols) # this gives error
ols = FixedEffectModels.reg(df, @formula(a ~ b + c + fe(d)), save = true)
residuals(ols) #this works fine
Maybe it's worth making this remark on the documentation, rather than fixing it? After all, it's a package for the inclusion of fixed effects. But sometimes you want to compare results with a simple OLS.
I just also noticed that adding an intercept in this way crashes predict
if there are missing values (it works fine if there are no missing values). The MWE is:
df = DataFrame(a = rand(100), b = rand(100), c = rand(100), d = ones(100))
allowmissing!(df)
df.b[[30,40]] .= missing
ols = FixedEffectModels.reg(df, @formula(a ~ b + c + fe(d)), save = true)
residuals(ols) #this works fine
predict(ols,df) #this doesn't work
ols = FixedEffectModels.reg(df, @formula(a ~ b + c), save = true)
residuals(ols) #this gives error
predict(ols,df) #this works fine
I have just fixed the first issue. @nilshg: could you have a look at the second one?
So what happens here is that with missing we are getting two predicted fixed effects for the one group included in the model:
julia> unique(ols.fe)
2×2 DataFrame
Row │ d fe_d
│ Float64? Float64?
─────┼──────────────────────────
1 │ 1.0 0.608499
2 │ 1.0 missing
Which then means when leftjoin
ing fixed effects onto the original data to be able to add them to the predicted response, the data set is duplicated - every row with a fixed effect value of 1 will turn into one row with a fixed effect ot 0.6, and one with a missing fixed effect. We then get an error as the nonmissings
vector used to pick the rows without missing data is only half as long as the data set we're indexing into following duplication.
I'm a little surprised that we end up with a missing
fixed effect here, as I would have expected rows with missing observations to get dropped. The immediate issue would be solved by by replacing
fes = leftjoin(select(df, m.fekeys), unique(m.fe); on = m.fekeys, makeunique = true, matchmissing = :equal)
with
fes = leftjoin(select(df, m.fekeys), dropmissing(unique(m.fe)); on = m.fekeys, makeunique = true, matchmissing = :equal)
but it feels like a fix someplace before we reach predict
might be more appropriate?