pytorch-forecasting icon indicating copy to clipboard operation
pytorch-forecasting copied to clipboard

[BUG] training data time_idx should be end-align

Open workhours opened this issue 6 months ago • 6 comments

when I try a dataset with time series like: ts1: year from 2001 to 2020 => time_idx=[0,19] ts2: year from 2010 to 2020 => time_idx=[10,19] with: min_prediction_idx=6 min_prediction_length=5 ts2 will be removed from training data since 5+6>len(ts2) actually ts2 can be used for training e.g. time_idx[10:15] as input and time_idx[15:20] as target, and this case prediction_idx is 15>6 I don't know current implementation is wanting time_idx should be start-align, so: ts1=> time_idx=[0,19] ts2=> time_idx=[0,9] if this is the case, ts2 will expose future information to training model so at least TimeSeriesDataSet need a configuration time_idx is start align or end align. most cases time series prediction is handling latest n years prediction, the age of some group of data always less than n

workhours avatar May 29 '25 14:05 workhours

and if the software support only start-align, then min_prediction_idx will filter most of training data which length less than max of the time series lengths

workhours avatar May 29 '25 14:05 workhours

could you post a full piece of code with all imports, and explain:

  • what happens
  • what you think should happen (e.g., "output should be ...")

fkiraly avatar Jun 05 '25 17:06 fkiraly

Add a time_idx_alignment parameter to TimeSeriesDataSet, supporting:

"start" (default, current behavior)

"end" (align prediction window to the end of each series)

"sliding" (allow prediction windows anywhere they fit — most flexible)

This would allow valid windows from shorter time series to contribute to training without leaking future information.

kentstone84 avatar Jun 06 '25 03:06 kentstone84

Could you give some code examples?

fkiraly avatar Jun 06 '25 15:06 fkiraly

Thanks for the follow-up. The current "start"-aligned logic ends up excluding valid sequences from shorter time series, even when they have enough data to form a valid encoder + decoder window without leaking future targets.

Adding a time_idx_alignment parameter to TimeSeriesDataSet would make this more flexible. Proposed options:

"start" (default, current behavior)

"end": aligns prediction window to the end of each series

"sliding": allows valid prediction windows anywhere they fit in the series

This would let shorter time series like ts2 contribute to training — using, for example, time_idx 10–14 as input and 15–19 as target — without exposing future information. It's especially useful for real-world data where sequence lengths often vary.

# Proposal: Add time_idx_alignment` parameter to TimeSeriesDataSet

dataset = TimeSeriesDataSet( data, time_idx="time_idx", target="target", group_ids=["series_id"], max_encoder_length=5, max_prediction_length=5, time_idx_alignment="end", # options: "start" (default), "end", "sliding" )`

kentstone84 avatar Jun 06 '25 20:06 kentstone84

"sliding" behaves like a strided window with dynamic indexing, allowing the model to learn from all eligible subsequences, especially important for sparse or short time series

kentstone84 avatar Jun 06 '25 20:06 kentstone84