esrnn_torch icon indicating copy to clipboard operation
esrnn_torch copied to clipboard

I have some question about the input and output

Open chendiva opened this issue 3 years ago • 20 comments

Hi there, So I am now using a time series data which only have two columns- Date and Price. So I am wondering if I can use this algorithm in this situation, and let the algorithm train the model only on price, and predict the price in the future. In other words, I am wondering if this model can separate my data automatically, so that I will not need to separate by "lag" myself. Thank you for your help!

chendiva avatar Jul 29 '20 01:07 chendiva

Hi Chendiva, Try to parse your data as mentioned in the README file ( https://github.com/kdgutier/esrnn_torch) with the price data in the y_df, try to add a simple constant in the X_df to use if you don't have any exogenous variable. The algorithm needs you to plug with the same column names as specified by the README dataframes, the lag variables are calculated within the algorithm. Good luck.

On Tue, Jul 28, 2020 at 9:33 PM chendiva [email protected] wrote:

Hi there, So I am now using a time series data which only have two columns- Date and Price. So I am wondering if I can use this algorithm in this situation, and let the algorithm train the model only on price, and predict the price in the future. In other words, I am wondering if this model can separate my data automatically, so that I will not need to separate by "lag" myself. Thank you for your help!

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/kdgutier/esrnn_torch/issues/9, or unsubscribe https://github.com/notifications/unsubscribe-auth/AEYDACOZT46DNYONQ4FAZ2LR5537RANCNFSM4PLC4U2A .

kdgutier avatar Jul 29 '20 03:07 kdgutier

Hi, Will it affect the forecasting result when I add the exogenous variable?

chendiva avatar Jul 29 '20 03:07 chendiva

I recommend you to always have benchmarks to test complex models as the ESRNN, in our case we included the OWA metric in the validation set to compare the relative performance vs the Naive2 model as done in the M4 competition. Take extra care of the learning rate hyperparameters when tuning your model.

On Tue, Jul 28, 2020 at 11:55 PM chendiva [email protected] wrote:

Hi, Will it affect the forecasting result when I add the exogenous variable?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/kdgutier/esrnn_torch/issues/9#issuecomment-665415673, or unsubscribe https://github.com/notifications/unsubscribe-auth/AEYDACOYO7A676K5ELLF2RDR56MR3ANCNFSM4PLC4U2A .

kdgutier avatar Jul 29 '20 04:07 kdgutier

Sorry , I am still confused, if I add the exogenous variable as you recommend, will it affect the result? Is the x variable in your example added by you? Or this x is originally included in the dataset and use for forecasting?

chendiva avatar Jul 29 '20 04:07 chendiva

I would recommend to answer the question empirically (with the OWA metric), try the model and see if the performance remains acceptable.

On Wed, Jul 29, 2020 at 12:07 AM chendiva [email protected] wrote:

Sorry , I am still confused, if I add the exogenous variable as you recommend, will it affect the result? Is the x variable in your example added by you? Or this x is originally included in the dataset and use for forecasting?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/kdgutier/esrnn_torch/issues/9#issuecomment-665418564, or unsubscribe https://github.com/notifications/unsubscribe-auth/AEYDACOJEARSPRY267WNAHTR56OABANCNFSM4PLC4U2A .

kdgutier avatar Jul 29 '20 04:07 kdgutier

This also give me NaN for my y_hat

chendiva avatar Jul 30 '20 01:07 chendiva

Hi chendiva, I checked the bug, the ESRNN produces correct outputs that fail to merge in the predict method to the X_test_df if the frequency of the dataset is not correctly specified. For instance in the M3 dataset the frequency seems to be 'MS' for dates of the beginning of the month. In the bug reported before they were using 'M' frequency for dates at the end of the month. Let me know if this solves the problem.

On Wed, Jul 29, 2020 at 9:08 PM chendiva [email protected] wrote:

This also give me NaN for my y_hat

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/kdgutier/esrnn_torch/issues/9#issuecomment-666014739, or unsubscribe https://github.com/notifications/unsubscribe-auth/AEYDACJMG2TRGSEMTY7QJW3R6DBXHANCNFSM4PLC4U2A .

kdgutier avatar Jul 30 '20 03:07 kdgutier

Hi, how can I decide the frequency then?

chendiva avatar Jul 30 '20 03:07 chendiva

When you instantiate the ESRNN model model = ESRNN(params,...,frequency=‘MS’)

On Wed, 29 Jul 2020 at 11:13 pm, chendiva [email protected] wrote:

Hi, how can I decide the frequency then?

— You are receiving this because you commented.

Reply to this email directly, view it on GitHub https://github.com/kdgutier/esrnn_torch/issues/9#issuecomment-666065538, or unsubscribe https://github.com/notifications/unsubscribe-auth/AEYDACO76T7OWOQABBMBZH3R6DQNNANCNFSM4PLC4U2A .

kdgutier avatar Jul 30 '20 03:07 kdgutier

I actually use my dataset, not the M3 now. My dataset is daily base. so I set the frequency = 'D', but I then got the error like this: image

chendiva avatar Jul 30 '20 03:07 chendiva

That is a protection for the network that aims to protect the model from nan values. Clean nans from the data before using the ESRNN.

On Wed, 29 Jul 2020 at 11:17 pm, chendiva [email protected] wrote:

I actually use my dataset, not the M3 now. My dataset is daily base. so I set the frequency = 'D', but I then got the error like this: [image: image] https://user-images.githubusercontent.com/22489898/88876633-b3145880-d1f1-11ea-9e72-9cb9161432ee.png

— You are receiving this because you commented.

Reply to this email directly, view it on GitHub https://github.com/kdgutier/esrnn_torch/issues/9#issuecomment-666067517, or unsubscribe https://github.com/notifications/unsubscribe-auth/AEYDACJL7NY4Z7VQXW2RRIDR6DQ6HANCNFSM4PLC4U2A .

kdgutier avatar Jul 30 '20 03:07 kdgutier

Yes, I actually check the dataframe with this command: df.isnull().values.any(), which returns me False. But I still get the above result

chendiva avatar Jul 30 '20 03:07 chendiva

Have you tried printing those unique_ids? Also isnan function? On Wed, 29 Jul 2020 at 11:22 pm, chendiva [email protected] wrote:

Yes, I actually check the dataframe with this command: df.isnull().values.any(), which returns me False. But I still get the above result

— You are receiving this because you commented.

Reply to this email directly, view it on GitHub https://github.com/kdgutier/esrnn_torch/issues/9#issuecomment-666069620, or unsubscribe https://github.com/notifications/unsubscribe-auth/AEYDACPHZNUSO3FHTCM6TUDR6DROFANCNFSM4PLC4U2A .

kdgutier avatar Jul 30 '20 03:07 kdgutier

You are treating the unique_ids as a numeric variable. I suggest to check the README markdown of the github in which the input dataframes for the model are explained with detail.

On Wed, 29 Jul 2020 at 11:30 pm, chendiva [email protected] wrote:

I actually got this after using the function you mentioned: [image: image] https://user-images.githubusercontent.com/22489898/88877408-63369100-d1f3-11ea-9fff-69b4875d9096.png

— You are receiving this because you commented.

Reply to this email directly, view it on GitHub https://github.com/kdgutier/esrnn_torch/issues/9#issuecomment-666072800, or unsubscribe https://github.com/notifications/unsubscribe-auth/AEYDACOUL5GFLZFK3W5DBT3R6DSLRANCNFSM4PLC4U2A .

kdgutier avatar Jul 30 '20 03:07 kdgutier

Hi, there, I got the same problem with yours, have you solved it? I tried to slice the m4 data provided from the prepare_m4_data function, and found out that even I make sure the identifier in the training set and testing set are the same, it still generated NaN for the evaluation methods and the predictions, which was weird.

Yu-1245 avatar Jul 31 '20 17:07 Yu-1245

Hi!

I think this answer could be useful.

AzulGarza avatar Jul 31 '20 17:07 AzulGarza

Hi, I saw the answer you. I have checked my dataset and make the changed you mentioned, but it still generate NaN for me. @FedericoGarza

Yu-1245 avatar Jul 31 '20 17:07 Yu-1245

Hi, there, I got the same problem with yours, have you solved it? I tried to slice the m4 data provided from the prepare_m4_data function, and found out that even I make sure the identifier in the training set and testing set are the same, it still generated NaN for the evaluation methods and the predictions, which was weird.

No, I haven't solved the problem yet, even I tried his method.

chendiva avatar Jul 31 '20 22:07 chendiva

Hi, I have the same problem with my dataset. When I tried to find out the reason, I figured out that the NaN values appears for the first time in the long_to_wide function, more precisely: in the for loop. Any idea how to solve this? my data is structured exactly according to the specifications

def long_to_wide(self, X_df, y_df):
data = X_df.copy()
data['y'] = y_df['y'].copy()
sorted_ds = np.sort(data['ds'].unique())
ds_map = {}
for dmap, t in enumerate(sorted_ds):
	ds_map[t] = dmap
data['ds_map'] = data['ds'].map(ds_map)
data = data.sort_values(by=['ds_map','unique_id'])
df_wide = data.pivot(index='unique_id', columns='ds_map')['y']

x_unique = data[['unique_id', 'x']].groupby('unique_id').first()
last_ds =  data[['unique_id', 'ds']].groupby('unique_id').last()
assert len(x_unique)==len(data.unique_id.unique())
df_wide['x'] = x_unique
df_wide['last_ds'] = last_ds
df_wide = df_wide.reset_index().rename_axis(None, axis=1)

ds_cols = data.ds_map.unique().tolist()
X = df_wide.filter(items=['unique_id', 'x', 'last_ds']).values
y = df_wide.filter(items=ds_cols).values

return X, y

Worben avatar Aug 23 '20 12:08 Worben

Have you solved the issue Worben?

kdgutier avatar Sep 04 '20 21:09 kdgutier