statsforecast icon indicating copy to clipboard operation
statsforecast copied to clipboard

Problem getting fitted values using cross validation with a spark dataframe

Open Jonathan-87 opened this issue 9 months ago • 2 comments

What happened + What you expected to happen

When using cross validation with a spark dataframe we get an error when we want to recover fitted values saying "Exception: Please run cross_validation method using fitted=True".

Versions / Dependencies

statsforecast 1.7.4

Reproduction script

import datetime as dt import numpy as np import pandas as pd import matplotlib.pyplot as plt import seaborn as sns from statsforecast.utils import AirPassengersDF from statsforecast import StatsForecast from statsforecast.models import AutoETS

models = [AutoETS(season_length=12)]

sf = StatsForecast( models=models, freq='M', verbose=True, n_jobs=-1 )

horizon = 12 n_windows = 5

spk_df = spark.createDataFrame(AirPassengersDF) spk_df = spk_df.repartition("unique_id") spk_df.cache().persist()

cv_results = sf.cross_validation( df=spk_df, h=horizon, step_size=horizon, n_windows=n_windows, refit=True, level=[80, 90], id_col="unique_id", time_col="ds", target_col="y", fitted=True )

insample_fcst = sf.cross_validation_fitted_values()

Issue Severity

None

Jonathan-87 avatar Apr 30 '24 16:04 Jonathan-87

Hey @Jonathan-87, the cross validation method doesn't support fitted values with distributed, only the forecast method does.

jmoralez avatar Apr 30 '24 16:04 jmoralez

Oh ok good to know I have not seen a reference about this in the documentation, maybe I missed it. I supposed it will be the same for fit and fit_predict method (i write another issue about this)

Jonathan-87 avatar Apr 30 '24 17:04 Jonathan-87