skforecast
skforecast copied to clipboard
IndexError When lags is greater than number of steps skforecast==0.4.3
Another beginner question - what are the conditions for refit = True?
I have below error:
d:\programy\miniconda3\lib\site-packages\skforecast\ForecasterAutoreg\ForecasterAutoreg.py in _recursive_predict(self, steps, last_window, exog) 405 406 for i in range(steps): --> 407 X = last_window[-self.lags].reshape(1, -1) 408 if exog is not None: 409 X = np.column_stack((X, exog[i, ].reshape(1, -1)))
IndexError: index -6 is out of bounds for axis 0 with size 4
If it is important from input side I have following data:
data.shape (50,) data_train.shape (37,) data_test.shape (13,) steps = 13 initial lags: lags = int(data_train.shape[0]*0.4) = 14
whole grid search looks like that:
forecaster_rf = ForecasterAutoreg(
regressor = XGBRegressor(verbosity=1),
lags = lags
)
param_grid = {
'gamma': [0.5, 1, 1.5, 2, 5],
'subsample': [0.6, 0.8, 1.0],
'colsample_bytree': [0.6, 0.8, 1.0],
'max_depth': np.arange(2, 22, 2)
}
lags_grid = [6, 12, lags, [1, 3, 6, 12, lags]]
below lags throws an error too: lags_grid = np.arange(1, 3, 1) lags_grid = [1]
metric = mean_squared_log_error
results_grid = grid_search_forecaster(
forecaster = forecaster_rf,
y = data_train,
param_grid = param_grid,
steps = steps,
metric = metric,
refit = True,
initial_train_size = int(len(data_train)*0.5),
return_best = True,
verbose = True
)
Originally posted by @spike8888 in https://github.com/JoaquinAmatRodrigo/skforecast/issues/137#issuecomment-1108727110
Hi!
has anyone time and chance to look at this problem?
Hi @spike8888, This error is probably due to a bug in the piece of code that stores the values of last window. We are trying to identify and solve it.
Hi @spike8888,
The error occurs when max_lag
> observations used for training. In your example:
max_lag = 12 initial_train_size = 18
Therefore, the number of observations used in fit
is 18 - 12 = 6.
Since last_window
only stored the number of observations used in fit
, 6 in this case, the function returns an error because it needs the last 12 values to predict the step n+1
.
We fixed it in version 0.5.0. We are still developing this version but you can install it from GitHub using in the shell:
pip install git+https://github.com/JoaquinAmatRodrigo/[email protected]
Please, note that some features are still under development, like bayesian_search_forecaster
, inside this release. But, whatever you do with the previous versions, should work in the new one.
Thank you very much for an answer. I will check it out soon.
I checked it out. Error gone, it seems there is stop rule in the code which is somewhat dangerous because in my case grid search stopped after 2 model calculated. Please consider displaying warning informing that considering mix of lags and steps not all combinations will be calculated Is there a function that can return max_lag based on the data?
Hello @spike8888, Could you show an example of your grid_search? I didn't understand your problem.
Regarding max_lag
, the training matrix will have a length equal to len(y) - max_lag
. So, in an extreme case, if your serie y
has 50 data points and you use a max_lag = 48
you will only have 2 rows to train your model.
It seems I do not understand whole concept of lags. Are they used to predict next step (next value I want to predict)? If so why we put whole history as training much greater then lags?
Hello @spike8888,
You can find a good explanation about lags and the training matrix in the documentation or even googling it.
To summarize, in an autoregressive model the model is trained with his past behavior. If you use for example lags=3
it will take the 3 steps before each point to train the model. The function create_train_X_y
can help you to understand this:
# Create a forecaster with lags=3
# ==============================================================================
forecaster = ForecasterAutoreg(
regressor = RandomForestRegressor(random_state=123),
lags = 3
)
# Create a series with 10 points
# ==============================================================================
y = pd.Series(np.arange(10))
display(forecaster.create_train_X_y(y=y)[1])
Then we can print the training matrix.
X:
forecaster.create_train_X_y(y=y)[0]
lag_1 | lag_2 | lag_3 | |
---|---|---|---|
3 | 2 | 1 | 0 |
4 | 3 | 2 | 1 |
5 | 4 | 3 | 2 |
6 | 5 | 4 | 3 |
7 | 6 | 5 | 4 |
8 | 7 | 6 | 5 |
9 | 8 | 7 | 6 |
y:
forecaster.create_train_X_y(y=y)[1]
y | |
---|---|
3 | 3 |
4 | 4 |
5 | 5 |
6 | 6 |
7 | 7 |
8 | 8 |
9 | 9 |
Fixed it in version 0.5.0.