statsforecast
statsforecast copied to clipboard
ValueError: xreg is rank deficient
What happened + What you expected to happen
Hi,
I am trying to use exogenous features for statsForecast.fit method. For some reason, I am unable to do so as it says:
ValueError: xreg is rank deficient
I amusing one-hot encoding for the month and due to that some columns have 0 throughout in the example data below, but with full data, I do not have any zero cols, and there are no constant columns in the data as well. Also, there are no features that have an absolute correlation of more than 0.7.
Versions / Dependencies
python: 311 statsForecast: 1.7.4
Reproduction script
models = [
AutoARIMA(season_length=31, nmodels=94, allowdrift=True),
# AutoCES(season_length=30),
AutoETS(season_length=31),
HoltWinters(season_length=31),
MSTL(season_length=31, trend_forecaster=AutoARIMA(), alias="MSTL-ARIMA"),
MSTL(season_length=31),
# AutoTheta(season_length=31),
# DOT(season_length=31),
# SeasonalWindowAverage(
# window_size=60, season_length=30
# ),
# SeasonalWindowAverage(
# window_size=90, season_length=30, alias="SeasWA30-93"
# ),
# SeasonalWindowAverage(
# window_size=120, season_length=30, alias="SeasWA30-120"
# ),
RandomWalkWithDrift(),
SeasonalNaive(season_length=31)
]
dc_models = StatsForecast(
models=models,
freq="D",
n_jobs=-1,
verbose=True
)
data = {
'ds': ['2024-04-01', '2024-04-02'],
'unique_id': [1, 2],
'y': [100, 200],
'holiday': [0, 1],
'daysuntilendmonth': [10, 9],
'tax_return': [1, 0],
'bailiff_finland': [0, 1],
'salary': [5000, 6000],
'day_of_week_0': [0, 0],
'day_of_week_1': [0, 1],
'day_of_week_2': [1, 0],
'day_of_week_3': [0, 0],
'day_of_week_4': [0, 0],
'day_of_week_5': [0, 0],
'day_of_week_6': [0, 0],
'month_indicator_1': [0, 0],
'month_indicator_2': [0, 0],
'month_indicator_3': [0, 0],
'month_indicator_4': [1, 1],
'month_indicator_5': [0, 0],
'month_indicator_6': [0, 0],
'month_indicator_7': [0, 0],
'month_indicator_8': [0, 0],
'month_indicator_9': [0, 0],
'month_indicator_10': [0, 0],
'month_indicator_11': [0, 0],
'month_indicator_12': [0, 0],
'quarter_1': [0, 0],
'quarter_2': [1, 1],
'quarter_3': [0, 0],
'quarter_4': [0, 0]
}
data= pd.DataFrame(data)
data['ds'] = pd.to_datetime(data['ds'])
exog = True
dc_models.fit(df = data if exog else data[['ds', 'unique_id', 'y']], prediction_intervals=None)
Updated: If I remove both the day_of_week and month_indicator one-hot encodings, it works. But I am not sure what could be a reason behind this. Also, is there any other way to include month as it is an important feature.
Issue Severity
High: It blocks me from completing my task.
Hey @obiii, thanks for using statsforecast. Can you please provide a minimal reproducible example? You can follow the tips here.
Hey @obiii, thanks for using statsforecast. Can you please provide a minimal reproducible example? You can follow the tips here.
Hi @jmoralez I have updated the question now.
Thanks! I believe this is due to the colinearity that the dummies introduce, can you try dropping one of the levels? i.e. use 6 dummies for day of week, 11 for month and 3 for quarters. You can read more about the problem here.
This issue has been automatically closed because it has been awaiting a response for too long. When you have time to to work with the maintainers to resolve this issue, please post a new comment and it will be re-opened. If the issue has been locked for editing by the time you return to it, please open a new issue and reference this one.