linearmodels icon indicating copy to clipboard operation
linearmodels copied to clipboard

Interaction terms for Categorical Variables in a RE Model

Open Arceus opened this issue 2 years ago • 4 comments

I'm trying to assess the effects on "returnAvg" of the interactions between my "SupplyClass" and "DupeClass" categorical variables, but I'm not sure I'm doing it right given that I always get the following error:

Traceback (most recent call last):
  File "F:\Python Projects\Tesi\Random Effects Regression.py", line 51, in <module>
    model = RandomEffects.from_formula("returnAvg ~ 1 + type + Color + Items + StoreRange + HasGlow + HasCutout + SupplyClass*DupeClass", data=data)
  File "F:\Python Projects\Tesi\venv\lib\site-packages\linearmodels\panel\model.py", line 2670, in from_formula
    mod = cls(dependent, exog, weights=weights, check_rank=check_rank)
  File "F:\Python Projects\Tesi\venv\lib\site-packages\linearmodels\panel\model.py", line 2616, in __init__
    super().__init__(dependent, exog, weights=weights, check_rank=check_rank)
  File "F:\Python Projects\Tesi\venv\lib\site-packages\linearmodels\panel\model.py", line 328, in __init__
    self._validate_data()
  File "F:\Python Projects\Tesi\venv\lib\site-packages\linearmodels\panel\model.py", line 479, in _validate_data
    rank_of_x = self._check_exog_rank()
  File "F:\Python Projects\Tesi\venv\lib\site-packages\linearmodels\panel\model.py", line 434, in _check_exog_rank
    raise ValueError(
ValueError: exog does not have full column rank. If you wish to proceed with model estimation irrespective of the numerical accuracy of coefficient estimates, you can set check_rank=False.
ValueError: exog does not have full column rank. If you wish to proceed with model estimation irrespective of the numerical accuracy of coefficient estimates, you can set check_rank=False.

Turning check_rank to True leads to Singular Matrix error.

I've tried looking into the documentation but I haven't found any answer on how to do it properly. Here's how I tried inserting the interaction terms: model = RandomEffects.from_formula("returnAvg ~ 1 + type + Color + Items + StoreRange + SupplyClass+ DupeClass + SupplyClass*DupeClass ", data=data)

Arceus avatar Nov 05 '23 00:11 Arceus

Are these both pandas categorigcals?

bashtage avatar Nov 07 '23 17:11 bashtage

Can you provide some more information on the structure of the data you are modeling?

bashtage avatar Nov 07 '23 17:11 bashtage

I'm pulling the data from my .csv longform unbalanced panel database. When I say categorical, e.g. SupplyClass or DupeClass, I refer to an array of strings classifying each id from column "itemName" differently. I don't run into any problem using these as separated explanatory variables without creating dummies beforehand, as linearmodels recognizes them as categoricals and handles them automatically.

Arceus avatar Nov 07 '23 19:11 Arceus

What size is the array? What are the entity and time indices?

bashtage avatar Nov 07 '23 20:11 bashtage