pmdarima icon indicating copy to clipboard operation
pmdarima copied to clipboard

ARIMA.arima_res_ doesn't store pd.Series name but statsmodels do

Open JavierEscobarOrtiz opened this issue 2 years ago • 0 comments

Describe the question you have

Hello!

We are creating a wrapper in Skforecast for forecasting using ARIMA models and we are using pmdarima as a dependency.

We are trying to apply the append method from statsmodels in ARIMA().arima_res_and we are finding different behavior between pmdarima and statsmodels.

Inside ARIMA.arima_res_ there is an attribute that stores the original endogenous data (ARIMA().arima_res_.model.data.orig_endog). When statsmodels is used, it stores the pd.Series and its name but when pmdarima is used the name is removed.

As result, when we try to apply the append() method we get the following error:

ValueError: Columns must match to concatenate along rows.

Reproducible example:

  • data:
import pandas as pd
import numpy as np

np.random.seed(123)
y_datetime = pd.Series(data=np.random.rand(50))
y_datetime.name = 'y'
y_datetime.index = pd.date_range(start='2000', periods=50, freq='A')
print(y_datetime.head(5))

last_window_datetime = pd.Series(data=np.random.rand(50))
last_window_datetime.name = 'y'
last_window_datetime.index = pd.date_range(start='2050', periods=50, freq='A')

2000-12-31 0.696469 2001-12-31 0.286139 2002-12-31 0.226851 2003-12-31 0.551315 2004-12-31 0.719469 Freq: A-DEC, Name: y, dtype: float64

  • statsmodels: (Here append() works)
from statsmodels.tsa.statespace.sarimax import SARIMAX

mod = SARIMAX(endog=y_datetime, order=(1,1,1))
res = mod.fit()
print(res.model.data.orig_endog.head(5))

new_res = res.append(last_window_datetime, refit=False)

2000-12-31 0.696469 2001-12-31 0.286139 2002-12-31 0.226851 2003-12-31 0.551315 2004-12-31 0.719469 Freq: A-DEC, Name: y, dtype: float64

  • pmdarima: (Here the Name is deleted and append() does not work)
from pmdarima.arima import ARIMA

mod = ARIMA(order=(1,1,1))
mod.fit(y_datetime)
print(mod.arima_res_.model.data.orig_endog.head(5))

mod.arima_res_ = mod.arima_res_.append(last_window_datetime, refit=False)

2000-12-31 0.696469 2001-12-31 0.286139 2002-12-31 0.226851 2003-12-31 0.551315 2004-12-31 0.719469 Freq: A-DEC, dtype: float64

Versions (if necessary)

Session info:

-----
numpy               1.23.5
pandas              1.4.0
pmdarima            2.0.2
pytest              7.1.2
session_info        1.0.0
skforecast          0.7.dev
sklearn             1.1.0
statsmodels         0.13.5
-----
IPython             8.5.0
jupyter_client      7.3.5
jupyter_core        4.11.1
notebook            6.4.12
-----
Python 3.9.13 (main, Aug 25 2022, 23:51:50) [MSC v.1916 64 bit (AMD64)]
Windows-10-10.0.19042-SP0
-----
Session information updated at 2023-01-09 12:08

JavierEscobarOrtiz avatar Jan 09 '23 11:01 JavierEscobarOrtiz