pytorch-forecasting
pytorch-forecasting copied to clipboard
TimeSeriesDataSet usage
- PyTorch-Forecasting version: 0.10.3
- PyTorch version: 1.13.1
- Python version: 3.8.10
- Operating System: Ubuntu
Hi I'm new to Pytorch-Forecasting. I have some newbie's questions that I really hope someone can help out here. I have a single time series for testing out TemporalFusionTransformer. The total sample length is 10,000. I'd like to use the first 5,000 data points for training and validation, and then perform prediction (in a hindcasting sense) on the rest of the samples. My question is how do I run the trained model on each sample in the test data in a sliding window fashion, namely, starting with max_encoder_length, I'd like to predict 1-day-ahead value one after another. The resulting prediction length should be 5000-max_encoder_length. Can you quickly look at my test_loader to see if that makes sense?
Many thanks.
#first add data group, because I have a single time series I use constant group_id =0
data['group'] = np.zeros((data.shape[0]), dtype=np.int8)
data['group'] = data['group'].astype(str).astype('category')
max_encoder_length = 30 #30 days
max_prediction_length = 1 # 1-day-ahead prediction
training_cutoff = 4000
validation_cutoff = 5000
training = TimeSeriesDataSet(
data[lambda x: x.time_idx<training_cutoff],
time_idx="time_idx",
target="Q",
group_ids=['group'],
min_encoder_length=max_encoder_length,
max_encoder_length=max_encoder_length,
min_prediction_length=max_prediction_length,
max_prediction_length=max_prediction_length,
time_varying_known_categoricals=[],
time_varying_unknown_categoricals=[],
time_varying_unknown_reals=["A", "B", "C", "D", "E", "Q"],
time_varying_known_reals=["time_idx"],
add_relative_time_idx=False,
add_target_scales=True,
static_categoricals=['group'],
randomize_length=None,
target_normalizer=GroupNormalizer(groups=["group"]),
scalers={},
)
validation = TimeSeriesDataSet.from_dataset(training,
data[ (data.time_idx >= training_cutoff) & (data.time_idx<validation_cutoff)],
predict=False, stop_randomization=True
)
testing = TimeSeriesDataSet.from_dataset(training,
data[lambda x: x.time_idx>=validation_cutoff],
stop_randomization=True, predict=False)
# training
# ....
# prediction
raw_predictions, x = best_tft.predict(test_dataloader, mode="raw", return_x=True)
validation = TimeSeriesDataSet.from_dataset(training, data, predict=predict, stop_randomization=True, min_prediction_idx=training_cutoff + 1) please add min_prediction_idx as train_cutoff or validation_cutoff depending on your need
Hi sorry for the late response. I read from the doc that "if predict=True this will take for each time series identified by group_ids the last max_prediction_length samples of each time series as prediction" so for testing data can I use the length of testdata, but turning on predict=True?
testing = TimeSeriesDataSet.from_dataset(training, data[lambda x: x.time_idx>=validation_cutoff], stop_randomization=True, predict=True, max_prediction_length = the_length_of_testdata)
If you do this, then you are predicting the entirety of the testing set all at once, not in a sliding window fashion.