fixest icon indicating copy to clipboard operation
fixest copied to clipboard

feols() incorrectly says FE's are colinear with regressors.

Open ja-ortiz-uniandes opened this issue 9 months ago • 11 comments
trafficstars

As always, thanks for developing such an incredible program. I use it almost daily!

Reproducible example

library(data.table)
library(fixest)

dt <- data.table(y = rnorm(100), x = rnorm(100), fe = rnorm(100, 1, 3))

feols(y ~ x | fe, dt)
# Error: in feols(y ~ x | fe, dt): 
# The only variable, 'x', is collinear with the fixed effects. Without doubt, your
# model is misspecified.

Expected behavior

Change the error message to include cases where there is one value per observation on the FE. This can either be done by amending the error such that it says "'x', is collinear with the fixed effects or there is not enough within-group variation in the FE variable" or preferably by including a new error message "insufficient within group variation in variable 'fe' to estimate fixed effects. Please check you model specification"

Why this matters

This is not a high priority issue. Fixing it would make using the program easier when you accidentally write an incorrect model specification. Sometimes, this can be rather difficult to diagnose. In particular if regressions are nested or inside functions, or both. In these cases formulas are typically constructed and it is not always obvious why the program failed. Correcting the error message makes the issue clearer and can save developers time in such situations (ehem...).

ja-ortiz-uniandes avatar Feb 07 '25 04:02 ja-ortiz-uniandes

A small note to the comment above that I thought was clear but didn't mention explicitly, x is not collinear with fe.

ja-ortiz-uniandes avatar Feb 12 '25 18:02 ja-ortiz-uniandes

I'm not sure why my comment disappeared, sorry. But what I meant was that you essentially have one fixed effect for every observation. There is no remaining variation to estimate the coefficient for x.

caleb-kwon avatar Feb 12 '25 18:02 caleb-kwon

You are right. I'm just saying the error message doesn't reflect that fact and it would be helpful to developers if the error message described the problem better.

Alejandro-Ortiz-WBG avatar Feb 12 '25 19:02 Alejandro-Ortiz-WBG

Right! That would be helpful.

caleb-kwon avatar Feb 12 '25 19:02 caleb-kwon

Thank you!

Alejandro-Ortiz-WBG avatar Feb 12 '25 19:02 Alejandro-Ortiz-WBG

@Alejandro-Ortiz-WBG, the comment is not x is collinear with the fixed effect!

"Fixed effects" are indicator variables for each unique value of fe:

df <- data.frame(y = rnorm(3), x = rnorm(3), fe = rnorm(3, 1, 3))
model.matrix(~ 0 + x + factor(fe), data = df)
#>            x factor(fe)0.37770054215013 factor(fe)0.51974892031667
#> 1 -0.9281797                          1                          0
#> 2  0.6688002                          0                          1
#> 3  0.8449222                          0                          0
#>   factor(fe)1.18450468893623
#> 1                          0
#> 2                          0
#> 3                          1

The three indicators are collinear with x. I think you mean the numeric vector fe and x are not collinear, which is true (but not what you're trying to estimate).

This is a very edge case because it will only happen when each value of the incorrect fixed effect is unique. This error wouldn't happen if say fe was discrete but with many values. In other words, there's not a really good way to "detect" when a person shouldn't be using fixed effects

kylebutts avatar Feb 19 '25 17:02 kylebutts

@kylebutts You are right that the error would not happen if there were more observations than FEs. The point is, however, the error message reads 'x', is collinear with the fixed effects which is inaccurate and thus leads to confusion when debugging.

ja-ortiz-uniandes avatar Feb 19 '25 21:02 ja-ortiz-uniandes

@ja-ortiz-uniandes You are mixing up the variable you called fe and the "fixed effects" which are a set of mutually exclusive indicator variables. 'x' is collinear with the fixed effects; 'x' is not collinear with the variable you called fe

kylebutts avatar Feb 19 '25 21:02 kylebutts

@kylebutts Thanks, you are right. I do believe that adding clarity to the message would be useful, something along the lines of "As many FE as observations makes estimation not possible." Saying simply "x is collinear with the FEs" gives you the idea that the variable fe is collinear with x. Additionally, the message is just (slightly) incorrect,x is not collinear with the FEs, x is collinear with the FEs and the intercept. To be clear, I appreciate the time it took to develop these custom error messages and know this required effort beyond simply saying "your model is rank deficient". Now that these messages have been incorporated, making sure they are easy to understand and accurate can further help developers when issues arise. A check of the sort if (length(unique(fe)) == NROW(dt)) stop(...) would further increase the usefulness of these errors.

ja-ortiz-uniandes avatar Feb 19 '25 23:02 ja-ortiz-uniandes

Hi @ja-ortiz-uniandes! Thanks for the suggestion, I'm always happy to improve the error messages in general.

That said, I think there is a misunderstanding here on what the FEs are (as @kylebutts pointed out). The error message is absolutely correct. The variable x is indeed collinear with the fixed-effects, and the model is indeed misspecified.

x is not collinear with the FEs, x is collinear with the FEs and the intercept

I think there is a misunderstanding here: fixed-effects correspond to a partition of the data, hence the point of whether there is or isn't an intercept is moot.

But again: why not adding a specific error message for the case when n_FE = n_obs, thanks for the suggestion. I'll let you know.

lrberge avatar Feb 26 '25 17:02 lrberge

Thanks @lrberge! A better error message is all I am suggesting because I do think it would be helpful.

You are correct that FEs are a partition and that at the end of the day the inclusion of an intercept or not does not matter for the estimation. Perhaps even more so when feols() doesn't allow for partialled out FE and intercept estimation (as far as I know). I do believe there is a difference in the way we think and specify these models, however, but don't think it is worth prolonging this discussion further.

If such a change is implemented, I know I would appreciate it!

Alejandro-Ortiz-WBG avatar Feb 27 '25 21:02 Alejandro-Ortiz-WBG