pytorch-forecasting icon indicating copy to clipboard operation
pytorch-forecasting copied to clipboard

What is the best way to generat time_idx for global tft model with different lenght of timeseries in data?

Open strateg17 opened this issue 2 years ago • 5 comments

  • PyTorch-Forecasting version: 0.10.3
  • PyTorch version: 1.13
  • Python version: 3.8
  • Operating System: Ubuntu 18.04

Expected behavior

I am going to create a global model to predict demand or each pair of Store and Product. But fir this purpose I have created my own function for time_idx generation. The first function is (creates separate time_idx - different for all time series in data): :

` def time_idx_func(x: pd.DataFrame, config: addict.Dict, data_freq: str="D") -> pd.DataFrame: """ Creates special integer time index for a given dataframe, based on a classic Datetime column. It is necessary for a pytorch-forecasting framework as a requirements.

Args:
  x (pd.DataFrame):  Input dataframe with datetime column.
  data_freq (str): Data freq for using in pd.date_range function
  config (addict.addict.Dict): 

Returns:
  pd.DataFrame: Result dataframe with integer index column.
"""
date_range = pd.date_range(
    start = x[config.params.date_column].min(), 
    end = x[config.params.date_column].max(), 
    freq = data_freq
)
date_range_df = pd.DataFrame({
    config.params.date_column: date_range
})
date_range_df[config.params.date_column] = date_range_df[config.params.date_column].astype(str)

date_range_df['time_idx'] = date_range_df.index
x = pd.merge(x, date_range_df, on = [config.params.date_column], how = 'left')
return x

`

The second functions is (creates global time_idx - same for all time series in data): ` def time_idx_func(x: pd.DataFrame, config: addict.Dict, data_freq: str="D") -> pd.DataFrame: """ Creates special integer time index for a given dataframe, based on a classic Datetime column. It is necessary for a pytorch-forecasting framework as a requirements.

Args:
  x (pd.DataFrame):  Input dataframe with datetime column.
  data_freq (str): Data freq for using in pd.date_range function
  config (addict.addict.Dict): 

Returns:
  pd.DataFrame: Result dataframe with integer index column.
"""
if data_freq == 'D':
    date_range = pd.date_range(
        start = x[config.params.date_column].min(), 
        end = x[config.params.date_column].max(), 
        freq = data_freq
    )
else:
    date_range = np.sort(x[config.params.date_column].unique())

date_range_df = pd.DataFrame({
    config.params.date_column: date_range
})
date_range_df[config.params.date_column] = date_range_df[config.params.date_column].astype(str)

date_range_df['time_idx'] = date_range_df.index
x = pd.merge(x, date_range_df, on = [config.params.date_column], how = 'left')
return x

`

Actual behavior

So can you explain what is the right way to create and pass time idx for TFT?

strateg17 avatar Feb 06 '23 21:02 strateg17

@strateg17 Even I had similar problem, What I have done is filled the missing value with a random number/0/ffill what ever u want, but assign the weight of it to zero. Then, the loss will not be calculated for those time stamps. At least this is what I have done.

sairamtvv avatar Apr 04 '23 10:04 sairamtvv

@sairamtvv, i used forward fill with zeros and it worked like a charm)

strateg17 avatar Apr 05 '23 19:04 strateg17

@sairamtvv , could you please elaborate on how to assign the weight to zero?

jensonwang99 avatar Jul 06 '23 01:07 jensonwang99

Sorry for the late reply., but this is how i understand it. you can assign weights to the time stamps (like how much importance should be given). therefore, fill the missing values or ffill or bfill,. Assign very small value of weights to the time stamps that are missing. Probably,. some experts can also comment on this methodology

sairamtvv avatar Sep 29 '23 16:09 sairamtvv

weight option in the timeseries data set

sairamtvv avatar Nov 09 '23 17:11 sairamtvv