darts [QUESTION] Is it possible to compute single point forecasting loss?

Greetings, I searched the documentation and issues but I couldn't find a clear answer to what I have in mind. Let's say I want to forecast the future value at 7 days; for that I would normally put output_chunk_length=7. What I noticed is that doing so requires me to have labels of the same length. Let me explain in further detail. I am using a custom dataloader class which I wrote by extending the corresponding class for each model (e.g. MixedCovariatesSequentialDataset for DLinear) because I wanted to control the shape and kind of data I am giving the models. I am trying to do multivariate to single point forecasting, thus I'm looking for a way to evaluate the last value of the 7 predicted by the model instead of the whole sequence as my windows only have one value associated with them, which is output_chunk_length values in the future for the target variable. In other state-of-the-art models like PatchTST, which I'm testing, I'm able to take the last value of the forecast using something like outputs = output[:, -1, -1], where the first dimension is the batch size, the second one is the length of the forecast and the last one is the number of forecasted features (in my case there is only one). I have taken a look at how darts works under the hood through the PyTorch Lightning trainer by testing DLinear, which practically has the same code as PatchTST from the GitHub repos, but I couldn't find a way to tell the darts model "Compute the MSE loss only over the last value instead of the last output_chunk_length values". Can you give me any insight about how to proceed? Thank you in advance, I'm looking forward to your kind response. I'm at your disposal for whatever doubt you have or information you need.

Best Regards, Giacomo Guiduzzi

Mar 27 '25 12:03 giacomoguiduzzi

You should be able to achieve what you want by using output_chunk_shift, which sets the gap between the last observation and the first prediction.

Setting output_chunk_length=1 and output_chunk_shift=6 in your example should result in a model that predicts only a single point seven steps in the future, calculating the loss accordingly.

Mar 27 '25 15:03 eschibli

Hi @eschibli,

Thank you for your answer. So the output_chunk_shift parameter doesn't simply shift the predictions ahead; the model is actually able to learn in a different way through it. I read about it, but I wasn't sure it was what I was looking for. Are there Deep Learning models in darts that do not provide the output_chunk_shift parameter? Is there a workaround for that? Are there any special cases to look out for, e.g., autoregressive models? Thank you once again.

Best Regards, Giacomo Guiduzzi

Mar 27 '25 16:03 giacomoguiduzzi

output_chunk_shift is supported for all global forecasting models as of 0.28. I believe this includes all of the neural network-based models except possibly the RNN cell (for which it doesn't make sense anyways)

Mar 27 '25 16:03 eschibli

Thanks for the info @eschibli, I've been looking into this parameter using DLinear as a reference model. I don't quite get if output_chunk_shift only changes the dataloader's behavior or the model itself. I noticed it is used in torch_forecasting_model.py to define Datasets, but I couldn't find any point where this parameter could change the model's behaviour. Nonetheless, when testing it out as you suggested using output_chunk_length=1 and output_chunk_shift=6, I do get an output that is batch_size x 1 x 1, even if during the forward pass this parameter is not used. How is the model affected by output_chunk_shift, and what difference does it make when using it with output_chunk_length=1 instead of output_chunk_length=7? Is the architecture somehow affected? What am I missing? I'm looking forward to your kind response.

Have a great weekend, Giacomo Guiduzzi

Mar 28 '25 17:03 giacomoguiduzzi

I'm not intimately familiar with every model implemented in Darts, but for the case of D-Linear, as well as most models I can think of, only the dataset should depend on output_chunk_shift, while the model architecture depends on output_chunk_length. D-Linear, for example is just a linear map of the trend and residual components of past values for the targets and covariates, the future covariates, and the static covariates to an output_chunk_legnth, n_targets matrix. A nonzero output_chunk_shift simply changes the timesteps this matrix (and the future covariates) correspond to.

Mar 28 '25 23:03 eschibli

Thank you for confirming my thoughts. This means that the parameter is not used at all because I am overloading the dataloader class using fit_from_dataset to supply my sliding windows. My labels are already fixed at a certain number of timestamps in the future but are for single-point forecasting, having size 1. For this reason, I am confused about how to correctly train models for single-point forecasting through darts: can we say that models learn through "one step ahead" while it is, for example, 7 days in the future? Or do they predict based on the dataset's sampling frequency, causing a mismatch in forecasted timestamps (I'm comparing the next day with the seventh)? Please give me your thoughts on the matter.

Mar 29 '25 12:03 giacomoguiduzzi

Hello Giacomo, sorry I've been (and still am) very busy this week and don't have much time.

What exactly are you overwriting? TrainingDataset?

Apr 02 '25 20:04 eschibli

No worries @eschibli, I'll give you some more context. In issue #2365, I asked for info about controlling the window-creating process, as I wanted to be sure regarding how data was being served to the models. So what I did in practice is extend both PyTorch's Dataset class and darts' dataset class depending on the model (e.g., in the case of D-Linear I extended MixedCovariatesSequentialDataset and MixedCovariatesInferenceDataset. In fact, darts needs an instance of a TrainingDataset to be sure of the kind of instance it is working with, but it's not using it from what I saw. This way, having extended Dataset from PyTorch, I overloaded its __len__ and __getitem__ methods so that darts could use them when calling fit_from_dataset(). Aligning what my custom class returns from the __getitem__ method with what the model expects from using a MixedCovariatesSequentialDataset, I'm able to serve my sliding window dataset to the models, bypassing darts' data loading. As an example, using D-Linear:

    def __getitem__(self, item) -> Tuple[
        np.ndarray,
        Optional[np.ndarray],
        Optional[np.ndarray],
        Optional[np.ndarray],
        Optional[np.ndarray],
        Optional[np.ndarray],
        np.ndarray,
    ]:
        past_target = self.data[item][:, [self.target]]
        past_covariate = self.data[item][
            :, [i for i in range(self.num_features) if i != self.target]
        ]
        historic_future_covariate = None
        future_covariate = None
        static_covariate = None
        future_target = self.labels[item].reshape(1, 1)

        # copied these names from MixedCovariatesSequentialDataset as a reference
        return (
            past_target,
            past_covariate,
            historic_future_covariate,
            future_covariate,
            static_covariate,
            None,
            future_target,
        )

This is the reason for my question about output_chunk_shift and its consequences; I feel it is not impactful if it works only through the dataset instance. Let me know what you think, and if you need more information, I'm at your disposal.

Apr 04 '25 09:04 giacomoguiduzzi