Interaction terms for Categorical Variables in a RE Model
I'm trying to assess the effects on "returnAvg" of the interactions between my "SupplyClass" and "DupeClass" categorical variables, but I'm not sure I'm doing it right given that I always get the following error:
Traceback (most recent call last):
File "F:\Python Projects\Tesi\Random Effects Regression.py", line 51, in <module>
model = RandomEffects.from_formula("returnAvg ~ 1 + type + Color + Items + StoreRange + HasGlow + HasCutout + SupplyClass*DupeClass", data=data)
File "F:\Python Projects\Tesi\venv\lib\site-packages\linearmodels\panel\model.py", line 2670, in from_formula
mod = cls(dependent, exog, weights=weights, check_rank=check_rank)
File "F:\Python Projects\Tesi\venv\lib\site-packages\linearmodels\panel\model.py", line 2616, in __init__
super().__init__(dependent, exog, weights=weights, check_rank=check_rank)
File "F:\Python Projects\Tesi\venv\lib\site-packages\linearmodels\panel\model.py", line 328, in __init__
self._validate_data()
File "F:\Python Projects\Tesi\venv\lib\site-packages\linearmodels\panel\model.py", line 479, in _validate_data
rank_of_x = self._check_exog_rank()
File "F:\Python Projects\Tesi\venv\lib\site-packages\linearmodels\panel\model.py", line 434, in _check_exog_rank
raise ValueError(
ValueError: exog does not have full column rank. If you wish to proceed with model estimation irrespective of the numerical accuracy of coefficient estimates, you can set check_rank=False.
ValueError: exog does not have full column rank. If you wish to proceed with model estimation irrespective of the numerical accuracy of coefficient estimates, you can set check_rank=False.
Turning check_rank to True leads to Singular Matrix error.
I've tried looking into the documentation but I haven't found any answer on how to do it properly.
Here's how I tried inserting the interaction terms:
model = RandomEffects.from_formula("returnAvg ~ 1 + type + Color + Items + StoreRange + SupplyClass+ DupeClass + SupplyClass*DupeClass ", data=data)
Are these both pandas categorigcals?
Can you provide some more information on the structure of the data you are modeling?
I'm pulling the data from my .csv longform unbalanced panel database. When I say categorical, e.g. SupplyClass or DupeClass, I refer to an array of strings classifying each id from column "itemName" differently. I don't run into any problem using these as separated explanatory variables without creating dummies beforehand, as linearmodels recognizes them as categoricals and handles them automatically.
What size is the array? What are the entity and time indices?