How are samples generated with RegressionModel / MLPRegressor?
Hi there,
I tried sklearn's MLPRegressor with the RegressionModel wrapper, and to my surprise, I was able to generate samples with historical_forecasts (e.g. with num_samples = 1000). How is this possible? Neither RegressionModel nor MLPRegressor accepts any kind of likelihood or loss function as an argument or am I missing something?
model.supports_probabilistic_prediction is actually false in my case but I can still generate samples.
I would be happy to use this but it is not officially supported, right?
Hi @dwolffram,
Can you please share a reproducible code snippet? If the model is not probabilistic, it should indeed not be able to generate samples but I might be missing something...
Hi @madtoinou,
that's what I thought, but somehow I get samples anyway 😅 Or am I doing something wrong?
import matplotlib.pyplot as plt
from darts import concatenate
from darts.datasets import AirPassengersDataset
from sklearn.neural_network import MLPRegressor
from darts.models.forecasting.regression_model import RegressionModel
series = AirPassengersDataset().load()
validation_start = 60
mlp = MLPRegressor(
hidden_layer_sizes = (8),
max_iter = 5000
)
model = RegressionModel(
model = mlp,
output_chunk_length=4,
multi_models = True,
lags = 4
)
model.supports_probabilistic_prediction # False
model.fit(series)
hfc = model.historical_forecasts(
series=series,
start=validation_start,
forecast_horizon=4,
stride=4,
last_points_only=False,
retrain=False,
verbose=True,
num_samples=1000
)
hfc = concatenate(hfc, axis=0)
series.plot()
hfc.plot()
plt.show()
I just realized that if I set output_chunk_length < 4, I indeed get the error "ValueError: num_samples > 1 is only supported for probabilistic models." No idea if that helps 🤔
I did a bit of investigation and this is the combination of several things;
- the optimized historical forecast method is called because
retrain=Falseandforecast_horizon<=output_chunk_length. This method does not rely onpredict()and parallelize all the predictions to speed up things. And in this parallelization, it also duplicate the axes along thenum_samplesdimension. - if you look at the forecast, all the samples for a given position/time are exactly identical (with
output_chunk_lengthdifferent values). - in the plotting, the quantiles are taken from this repeated values.
An easy way to prevent this would just add some sanity check on the num_samples argument (which is usually taken care of by predict()) in the optimized historical forecast routine. There might be more to it, notably when the predictions are reshaped but it would require further testing.
Thanks for looking into it!
Madtoinou, why are the quantiles so wide then? Wouldn't 1000 identical predictions imply zero uncertainty?
Because there are still output_chunk_length different forecasts, due to the erroneous shape of the input tensor.