skforecast
skforecast copied to clipboard
In-sample predictions are not back transformed in ForecasterAutoreg()
The problem
In order to generate in-sample predictions (aka fitted values), you need to create training matrices with .create_train_X_y()
and use it with .predict()
in the internal regressor, as described in the docs. But when any transformation is given to the ForecasterAutoreg()
, it appears that the in-sample predictions are not being reverted to the original scale of the data.
Is it something that I missing or there is a way to revert the transformation?
Reproducible example
# Libraries
# ==============================================================================
import pandas as pd
import numpy as np
from skforecast.ForecasterAutoreg import ForecasterAutoreg
from skforecast.datasets import fetch_dataset
from sklearn.ensemble import RandomForestRegressor
from sklearn.preprocessing import PowerTransformer
# Download data
# ==============================================================================
data = fetch_dataset(name = "h2o_exog", raw = False)["y"]
# Split train-test
# ==============================================================================
steps = 36
data_train = data[:-steps]
data_test = data[-steps:]
# Plot
# ==============================================================================
pd.concat([data_train.rename("train"), data_test.rename("test")], axis = "columns").plot()
# Create and fit forecaster without transformer
# ==============================================================================
forecaster_notrans = ForecasterAutoreg(
regressor = RandomForestRegressor(random_state = 123),
lags = 12
)
forecaster_notrans.fit(y = data_train)
forecaster_notrans
# Create training matrices
# ==============================================================================
X_train, y_train = forecaster_notrans.create_train_X_y(data_train)
X_train.head()
# Predict using the internal regressor
# ==============================================================================
predictions_1 = forecaster_notrans.regressor.predict(X_train)
predictions_1[:4]
# Plot predictions
# ==============================================================================
pd.concat([
pd.Series(predictions_1, name = "fitted", index = data_train.index[forecaster_notrans.max_lag:]),
data_train.rename("train")
], axis = "columns").plot(title = "No transformer");
# Create and fit forecaster with transformer
# ==============================================================================
forecaster_trans = ForecasterAutoreg(
regressor = RandomForestRegressor(random_state = 123),
lags = 12,
transformer_y = PowerTransformer()
)
forecaster_trans.fit(y = data_train)
forecaster_trans
# Create training matrices
# ==============================================================================
X_train, y_train = forecaster_trans.create_train_X_y(data_train)
X_train.head()
# Predict using the internal regressor
# ==============================================================================
predictions_1 = forecaster_trans.regressor.predict(X_train)
predictions_1[:4]
# Plot predictions
# ==============================================================================
pd.concat([
pd.Series(predictions_1, name = "fitted", index = data_train.index[forecaster_trans.max_lag:]),
data_train.rename("train")
], axis = "columns").plot(title = "With transformer");
# Out of sample predictions are OK
# ==============================================================================
predictions_3 = forecaster_trans.predict(steps = steps)
predictions_3.head(3)
# Plot predictions
# ==============================================================================
pd.concat([
pd.Series(predictions_1, name = "fitted", index = data_train.index[forecaster_trans.max_lag:]),
data_train.rename("train"),
data_test.rename("test"),
predictions_3.rename("forecast")
], axis = "columns").plot(title = "With transformer");
Session information
Preparing metadata (setup.py) ... done
Building wheel for session-info (setup.py) ... done
Click to view session information
-----
matplotlib 3.7.1
numpy 1.25.2
pandas 2.0.3
session_info 1.0.0
skforecast 0.12.1
sklearn 1.2.2
-----
Click to view modules imported as dependencies
PIL 9.4.0
backcall 0.2.0
certifi 2024.06.02
cffi 1.16.0
cloudpickle 2.2.1
cycler 0.12.1
cython_runtime NA
dateutil 2.8.2
debugpy 1.6.6
decorator 4.4.2
defusedxml 0.7.1
google NA
httplib2 0.22.0
ipykernel 5.5.6
ipython_genutils 0.2.0
joblib 1.4.2
kiwisolver 1.4.5
matplotlib_inline 0.1.7
mpl_toolkits NA
numexpr 2.10.0
packaging 24.1
pexpect 4.9.0
pickleshare 0.7.5
pkg_resources NA
platformdirs 4.2.2
portpicker NA
prompt_toolkit 3.0.47
psutil 5.9.5
ptyprocess 0.7.0
pyarrow 14.0.2
pydev_ipython NA
pydevconsole NA
pydevd 2.9.5
pydevd_file_utils NA
pydevd_plugins NA
pydevd_tracing NA
pygments 2.16.1
pyparsing 3.1.2
pytz 2023.4
scipy 1.11.4
setuptools 67.7.2
sitecustomize NA
six 1.16.0
socks 1.7.1
sphinxcontrib NA
storemagic NA
threadpoolctl 3.5.0
tornado 6.3.3
traitlets 5.7.1
typing_extensions NA
wcwidth 0.2.13
zmq 24.0.1
zoneinfo NA
-----
IPython 7.34.0
jupyter_client 6.1.12
jupyter_core 5.7.2
notebook 6.5.5
-----
Python 3.10.12 (main, Nov 20 2023, 15:14:05) [GCC 11.4.0]
Linux-6.1.85+-x86_64-with-glibc2.35
-----
Session information updated at 2024-06-14 11:16