sktime [BUG] Timeseries models as reduction of regression models are "confusing" and with "unreasonable" dependences between "fit" and "predict"

This is not exactly a BUG, but an "confused" implementation problem.

I think the architecture related to TS models based on REGRESSION models is a little confusing: too variants but also missing cases.

We start with some "ideal" requirements":

window_length: length of the window to use for the training it there exists.
prediction_length: length of the window to predict. At this moment this is the 'fh' passed to 'fit' ('fh_in_fit') BUT this approach breaks the 'standard interface' where we have 'fit(y, X=None)'. It is better to move this information at configuration level, at the same level than 'window_length' OR at minimum, to permit both approaches , AND to remove the requirement 'fh_in_fit'.
fh: forecasting horizon required in prediction it can be SHORTER than, EQUALS to, LONGER than the prediction_length. It is mandatory to keep these information separated, increasing, in this way the flexibility of the models.it does not have to be mandatory to know, at the training step, which will be the fh used in prediction. Sometimes it can be the same, sometimes no.

Now, the regressor models can be used in two distinct modes:

model: window_length -> prediction_length the model is trained to receive in input data equivalent to 'window_length' AND it generates 'prediction_length' predictions in a SINGLE step
model[i]: window_length -> 1 there are 'prediction_length' models, all receiving in input data equivalent to 'window_length' and they generate the prediction for a SINGLE output BUT the prediction generated will be 'prediction[i]'

More complex configurations are possible, but I think it is not necessary to increase the complexity here. There are another area where there is a more "intelligent" inprovement

The available models are:

"recursive" (RecursiveTabularRegressionForecaster)
- n models: 1
- window_length: any
- prediction_length: 1
- fh: any WHY it is not possible to use a 'prediction_length' longer than 1?
"multioutput" (MultioutputTabularRegressionForecaster)
- n models: 1
- window_length: any
- prediction_length: fh
- fh: any if 'fh_in_fit' is not the same than 'fh' used in prediction, it is generated an exception. WHY the 'prediction_length' MUST be equals to 'fh'? Training and prediction are events happening in two different contexts
"direct" (DirectTabularRegressionForecaster)
- n models: prediction_length
- window_length: any
- prediction_length: fh
- fh: any if 'fh_in_fit' is not the same than 'fh' used in prediction, it DOESN't generate an exception BUT the result is 'strange' (all zeros for the missing timeslots). WHY the 'prediction_length' MUST be equals to 'fh'? Training and prediction are events happening in two different contexts
"dirrec" (DirRecTabularRegressionForecaster)
- n models: prediction_length
- window_length: any
- prediction_length: fh
- fh: any if 'fh_in_fit' is not the same than 'fh' used in prediction, it is generated an exception. WHY the 'prediction_length' MUST be equals to 'fh'? Training and prediction are events happening in two different contexts
  
  Note: this model is strange!
DirectReductionForecaster (experimental) not analyzed yet
RecursiveReductionForecaster (experimental) not analyzed yet

The relation ('window_length'/'prediction_length') -> 'fh' is orthogonal to "single model"/ "multiple models". A single class can be used to support all previous cases PLUS the flexibility of the independence between 'prediction_length' and 'fh'.

The "intelligent" improvement is to replace 'window_length', 'prediction_length' with 'lags', that is, an object with the same flexibility of 'fh': it must be possible to specify "exactly" the list of timeslots for "X" and for "y" to use as input for the regressor.

Obviously, to support the 'recursion' there are limits in the structure of 'fh'. But it is reasonable to consider two cases:

IF it is necessary to support the recursion, the 'prediction window' must be a contigous sequence of timeslots
IF the prediction window is equals to 'fh', it is possible to predict "specific" time slots.

In "theory" it is possible to relax teh condition 1) but this introduces a complexity without a real "reason".

Jun 23 '24 04:06 corradomio

Have you looked at the rearchitecture, e.g., DirectReductionForecaster? It may solve some of these issues.

Your feedback would be appreciated! See #3224

Jun 23 '24 12:06 fkiraly

Have you looked at the rearchitecture, e.g., DirectReductionForecaster? It may solve some of these issues. Working in progress ;-)

Consider that I already have a my implementation I used to replace all previous 4 classes and supporting all requirements (except for "dirrec", but a model where the training window length is notnstant, it is a little "strange").

I analized better these models to understand if my model was able to replicate the behavior of the previous one.

Jun 23 '24 14:06 corradomio

If you have these models - do you want ot make a pull request?

Jun 24 '24 18:06 fkiraly