pytorch-forecasting icon indicating copy to clipboard operation
pytorch-forecasting copied to clipboard

TimeSeriesDataSet usage

Open dialuser opened this issue 2 years ago • 3 comments

  • PyTorch-Forecasting version: 0.10.3
  • PyTorch version: 1.13.1
  • Python version: 3.8.10
  • Operating System: Ubuntu

Hi I'm new to Pytorch-Forecasting. I have some newbie's questions that I really hope someone can help out here. I have a single time series for testing out TemporalFusionTransformer. The total sample length is 10,000. I'd like to use the first 5,000 data points for training and validation, and then perform prediction (in a hindcasting sense) on the rest of the samples. My question is how do I run the trained model on each sample in the test data in a sliding window fashion, namely, starting with max_encoder_length, I'd like to predict 1-day-ahead value one after another. The resulting prediction length should be 5000-max_encoder_length. Can you quickly look at my test_loader to see if that makes sense?

Many thanks.

    #first add data group, because I have a single time series I use constant group_id =0
    data['group'] = np.zeros((data.shape[0]), dtype=np.int8)
    data['group'] = data['group'].astype(str).astype('category')

    max_encoder_length = 30 #30 days
    max_prediction_length = 1  #  1-day-ahead prediction

    training_cutoff = 4000
    validation_cutoff = 5000
    
    training = TimeSeriesDataSet(
        data[lambda x: x.time_idx<training_cutoff],
        time_idx="time_idx",
        target="Q",
        group_ids=['group'],
        min_encoder_length=max_encoder_length,
        max_encoder_length=max_encoder_length,
        min_prediction_length=max_prediction_length,
        max_prediction_length=max_prediction_length,
        time_varying_known_categoricals=[],  
        time_varying_unknown_categoricals=[],
        time_varying_unknown_reals=["A", "B",  "C",  "D", "E", "Q"],
        time_varying_known_reals=["time_idx"],
        add_relative_time_idx=False,
        add_target_scales=True,
        static_categoricals=['group'],
        randomize_length=None,
        target_normalizer=GroupNormalizer(groups=["group"]),
        scalers={},
    )    
    
    validation = TimeSeriesDataSet.from_dataset(training, 
        data[ (data.time_idx >= training_cutoff) & (data.time_idx<validation_cutoff)],
        predict=False, stop_randomization=True 
    )
    
    testing = TimeSeriesDataSet.from_dataset(training, 
        data[lambda x: x.time_idx>=validation_cutoff],
        stop_randomization=True, predict=False)    
    # training
    # ....
    # prediction
    raw_predictions, x = best_tft.predict(test_dataloader, mode="raw", return_x=True)

dialuser avatar Jan 18 '23 23:01 dialuser

validation = TimeSeriesDataSet.from_dataset(training, data, predict=predict, stop_randomization=True, min_prediction_idx=training_cutoff + 1) please add min_prediction_idx as train_cutoff or validation_cutoff depending on your need

sairamtvv avatar Jan 24 '23 09:01 sairamtvv

Hi sorry for the late response. I read from the doc that "if predict=True this will take for each time series identified by group_ids the last max_prediction_length samples of each time series as prediction" so for testing data can I use the length of testdata, but turning on predict=True?

testing = TimeSeriesDataSet.from_dataset(training, data[lambda x: x.time_idx>=validation_cutoff], stop_randomization=True, predict=True, max_prediction_length = the_length_of_testdata)

dialuser avatar Feb 06 '23 23:02 dialuser

If you do this, then you are predicting the entirety of the testing set all at once, not in a sliding window fashion.

ivanightingale avatar Feb 13 '24 03:02 ivanightingale