statsforecast
statsforecast copied to clipboard
Problem getting fitted values using cross validation with a spark dataframe
What happened + What you expected to happen
When using cross validation with a spark dataframe we get an error when we want to recover fitted values saying "Exception: Please run cross_validation
method using fitted=True
".
Versions / Dependencies
statsforecast 1.7.4
Reproduction script
import datetime as dt import numpy as np import pandas as pd import matplotlib.pyplot as plt import seaborn as sns from statsforecast.utils import AirPassengersDF from statsforecast import StatsForecast from statsforecast.models import AutoETS
models = [AutoETS(season_length=12)]
sf = StatsForecast( models=models, freq='M', verbose=True, n_jobs=-1 )
horizon = 12 n_windows = 5
spk_df = spark.createDataFrame(AirPassengersDF) spk_df = spk_df.repartition("unique_id") spk_df.cache().persist()
cv_results = sf.cross_validation( df=spk_df, h=horizon, step_size=horizon, n_windows=n_windows, refit=True, level=[80, 90], id_col="unique_id", time_col="ds", target_col="y", fitted=True )
insample_fcst = sf.cross_validation_fitted_values()
Issue Severity
None
Hey @Jonathan-87, the cross validation method doesn't support fitted values with distributed, only the forecast method does.
Oh ok good to know I have not seen a reference about this in the documentation, maybe I missed it. I supposed it will be the same for fit and fit_predict method (i write another issue about this)