pytorch-forecasting
pytorch-forecasting copied to clipboard
What is the best way to generat time_idx for global tft model with different lenght of timeseries in data?
- PyTorch-Forecasting version: 0.10.3
- PyTorch version: 1.13
- Python version: 3.8
- Operating System: Ubuntu 18.04
Expected behavior
I am going to create a global model to predict demand or each pair of Store and Product. But fir this purpose I have created my own function for time_idx generation. The first function is (creates separate time_idx - different for all time series in data): :
` def time_idx_func(x: pd.DataFrame, config: addict.Dict, data_freq: str="D") -> pd.DataFrame: """ Creates special integer time index for a given dataframe, based on a classic Datetime column. It is necessary for a pytorch-forecasting framework as a requirements.
Args:
x (pd.DataFrame): Input dataframe with datetime column.
data_freq (str): Data freq for using in pd.date_range function
config (addict.addict.Dict):
Returns:
pd.DataFrame: Result dataframe with integer index column.
"""
date_range = pd.date_range(
start = x[config.params.date_column].min(),
end = x[config.params.date_column].max(),
freq = data_freq
)
date_range_df = pd.DataFrame({
config.params.date_column: date_range
})
date_range_df[config.params.date_column] = date_range_df[config.params.date_column].astype(str)
date_range_df['time_idx'] = date_range_df.index
x = pd.merge(x, date_range_df, on = [config.params.date_column], how = 'left')
return x
`
The second functions is (creates global time_idx - same for all time series in data): ` def time_idx_func(x: pd.DataFrame, config: addict.Dict, data_freq: str="D") -> pd.DataFrame: """ Creates special integer time index for a given dataframe, based on a classic Datetime column. It is necessary for a pytorch-forecasting framework as a requirements.
Args:
x (pd.DataFrame): Input dataframe with datetime column.
data_freq (str): Data freq for using in pd.date_range function
config (addict.addict.Dict):
Returns:
pd.DataFrame: Result dataframe with integer index column.
"""
if data_freq == 'D':
date_range = pd.date_range(
start = x[config.params.date_column].min(),
end = x[config.params.date_column].max(),
freq = data_freq
)
else:
date_range = np.sort(x[config.params.date_column].unique())
date_range_df = pd.DataFrame({
config.params.date_column: date_range
})
date_range_df[config.params.date_column] = date_range_df[config.params.date_column].astype(str)
date_range_df['time_idx'] = date_range_df.index
x = pd.merge(x, date_range_df, on = [config.params.date_column], how = 'left')
return x
`
Actual behavior
So can you explain what is the right way to create and pass time idx for TFT?
@strateg17 Even I had similar problem, What I have done is filled the missing value with a random number/0/ffill what ever u want, but assign the weight of it to zero. Then, the loss will not be calculated for those time stamps. At least this is what I have done.
@sairamtvv, i used forward fill with zeros and it worked like a charm)
@sairamtvv , could you please elaborate on how to assign the weight to zero?
Sorry for the late reply., but this is how i understand it. you can assign weights to the time stamps (like how much importance should be given). therefore, fill the missing values or ffill or bfill,. Assign very small value of weights to the time stamps that are missing. Probably,. some experts can also comment on this methodology
weight option in the timeseries data set