Informer2020 icon indicating copy to clipboard operation
Informer2020 copied to clipboard

Prediction Output Length Issue

Open CarsonLoi opened this issue 2 years ago • 3 comments

The issue is that I have feed a source data into the trained model for prediction (say the size is 4096 records x 9 features). Given the seq_len , label_len and pred_len is set at 168 / 24 / 1 respectively, I would expect the output will be 4096 - 168 = 3928 records. However, I found that the pred_loader only generate 1 row of data and the model only output 1 result.

May I have your kind advice how I can get rid of this issue? Many thanks

CarsonLoi avatar Aug 11 '22 09:08 CarsonLoi

Could you please share your dataloader.py file?

zhouhaoyi avatar Aug 17 '22 03:08 zhouhaoyi

Dear author, thanks for your reply. I think I am using same dataloader that you have provided in Colab

class Dataset_Pred(Dataset): def init(self, root_path, flag='pred', size=None, features='S', data_path='ETTh1.csv', target='OT', scale=True, inverse=False, timeenc=0, freq='15min', cols=None): # size [seq_len, label_len, pred_len] # info if size == None: self.seq_len = 2444 self.label_len = 244 self.pred_len = 244 else: self.seq_len = size[0] self.label_len = size[1] self.pred_len = size[2] # init assert flag in ['pred']

    self.features = features
    self.target = target
    self.scale = scale
    self.inverse = inverse
    self.timeenc = timeenc
    self.freq = freq
    self.cols=cols
    self.root_path = root_path
    self.data_path = data_path
    self.__read_data__()

def __read_data__(self):
    self.scaler = StandardScaler()
    df_raw = pd.read_csv(os.path.join(self.root_path,
                                      self.data_path))
    '''
    df_raw.columns: ['date', ...(other features), target feature]
    '''
    if self.cols:
        cols=self.cols.copy()
        cols.remove(self.target)
    else:
        cols = list(df_raw.columns); cols.remove(self.target); cols.remove('date')
    df_raw = df_raw[['date']+cols+[self.target]]
    
    border1 = len(df_raw)-self.seq_len
    border2 = len(df_raw)
    
    if self.features=='M' or self.features=='MS':
        cols_data = df_raw.columns[1:]
        df_data = df_raw[cols_data]
    elif self.features=='S':
        df_data = df_raw[[self.target]]

    if self.scale:
        self.scaler.fit(df_data.values)
        data = self.scaler.transform(df_data.values)
    else:
        data = df_data.values
        
    tmp_stamp = df_raw[['date']][border1:border2]
    tmp_stamp['date'] = pd.to_datetime(tmp_stamp.date)
    pred_dates = pd.date_range(tmp_stamp.date.values[-1], periods=self.pred_len+1, freq=self.freq)
    
    df_stamp = pd.DataFrame(columns = ['date'])
    df_stamp.date = list(tmp_stamp.date.values) + list(pred_dates[1:])
    data_stamp = time_features(df_stamp, timeenc=self.timeenc, freq=self.freq[-1:])

    self.data_x = data[border1:border2]
    if self.inverse:
        self.data_y = df_data.values[border1:border2]
    else:
        self.data_y = data[border1:border2]
    self.data_stamp = data_stamp

def __getitem__(self, index):
    s_begin = index
    s_end = s_begin + self.seq_len
    r_begin = s_end - self.label_len
    r_end = r_begin + self.label_len + self.pred_len

    seq_x = self.data_x[s_begin:s_end]
    if self.inverse:
        seq_y = self.data_x[r_begin:r_begin+self.label_len]
    else:
        seq_y = self.data_y[r_begin:r_begin+self.label_len]
    seq_x_mark = self.data_stamp[s_begin:s_end]
    seq_y_mark = self.data_stamp[r_begin:r_end]

    return seq_x, seq_y, seq_x_mark, seq_y_mark

def __len__(self):
    return len(self.data_x) - self.seq_len + 1

def inverse_transform(self, data):
    return self.scaler.inverse_transform(data)

CarsonLoi avatar Aug 25 '22 08:08 CarsonLoi

Only the last piece of data is used to predict the future in Dataset_Pred. You can refer to other dataset for modification. This mainly depends on the selection of border(parameter border1 and border2).

MountVoom avatar Sep 06 '22 07:09 MountVoom