skforecast
skforecast copied to clipboard
Grid search forecaster error:
I am trying to implement grid search to tune hyperparameters for RandomForestRegressor, but I received this error and I do not know where is exactly the problem since I also tried with .iloc instead of .loc and the problem continues to appear.
KeyError: "None of [Int64Index([ 1, 2, 3, 4, 5,
6, 7, 8, 9, 10,\n ...\n 666,
667, 668, 669, 670, 671, 672, 673, 674, 675],\n
dtype='int64', length=43011)] are in the [index]"
My code:
`from data_preparation import Preparation
from missing_timestamps import remove_duplicates
import pandas as pd
from skforecast.ForecasterAutoreg import ForecasterAutoreg
from skforecast.model_selection import grid_search_forecaster
from sklearn.ensemble import RandomForestRegressor
from skforecast.utils import save_forecaster
from skforecast.utils import load_forecaster
marvin
data = Preparation(r'/home/ieftimska/operato-meteo-1/data/MAS_processed/ELES-MAS-5001.csv.gz', "AMBIENT_TEMPERATURE") #data = Preparation(r'/home/iva/Desktop/operato-meteo-1/data/MAS_processed/ELES-MAS-5001.csv.gz', "AMBIENT_TEMPERATURE") train, test = data.split() train_processed = remove_duplicates(train) #train_processed_ = train_processed["AMBIENT_TEMPERATURE"].copy().squeeze() test_processed = remove_duplicates(test) #test_processed_ = test_processed["AMBIENT_TEMPERATURE"].copy().squeeze() whole_data = pd.concat([train_processed, test_processed]) whole_data = whole_data.rename(columns={"AMBIENT_TEMPERATURE": "y"}) whole_data.index = whole_data.index.rename("datetime")
forecaster = ForecasterAutoreg(regressor=RandomForestRegressor(random_state=123, n_jobs=-1, max_depth=10, n_estimators=100), lags=865) param_grid = { 'n_estimators': [50, 100], 'max_depth': [5, 10, 15] }
Lags used as predictors
lags_grid = [i for i in range(1, 865)]
results_grid = grid_search_forecaster( forecaster=forecaster, y=whole_data.loc[:, "y"], param_grid=param_grid, lags_grid=lags_grid, steps=864, refit=False, metric='mean_squared_error', initial_train_size=len(whole_data.loc[:"2022"]), fixed_train_size=False, return_best=True, n_jobs='auto', verbose=False, show_progress=True ) results_grid.to_csv("results_grid_search.csv")`
This is the data if someones tries to reproduce the result ELES-MAS-5001.csv.gz
Hello @tavlox
The problem is probably in len(whole_data.loc[:"2022"]
. If you are using .iloc
you should use an int
to access position 2022, not "2022"
. With .loc
it depends on your index, if it is a datetime index you should probably specify something like "01-01-2022".
It still appears the same issue, even when I use for example separated train set, without using loc so len(train_set).