neural_prophet
neural_prophet copied to clipboard
Holidays are not identified in Sub-daily data
Describe the bug
When we tried to use the model using hourly information we identified that holidays are not considered in the modeling. The example we are trying to run is the Carnival holiday in Brazil, March 1, 2022. Using both the model without the holidays and also with the holiday, it is returning the same result for both executions, that is, there was no inclusion of holidays .
When trying to solve the problem, I noticed that the data opened per hour, the forecast matrix only considered the first hour of the day, giving the result as 1 and for the other hours 0. I tried to solve this problem with myself by adding 1 to all other hours. but even following this logic, the model still does not use this information.
Screenshots
Using the template with holidays, history_df and result
Code to pass the information 1 for all other times of the holiday as using only create_df_with_events was not working for all times.
Using the template without the holidays and result
Hi @FBonke Thank you for filing this issue. Can you please provide a more minimal code example to reproduce the issue? Thank you
Hi @ourownstory
I used the test data from this place: https://www.kaggle.com/datasets/robikscube/hourly-energy-consumption OBS: with some handmade changes in the last week of registration to have an absurd outlier
test = pd.read_csv('s3a://40-ze-datalake-sandbox/01_projects/06_darkstore/02_development/Forecast/AEP_hourly.csv')
holidays_test = pd.DataFrame({
'event': 'holidays_test',
'ds': pd.to_datetime([
'2018-08-02'
]),
})
holidays_test2 = pd.DataFrame({
'event': 'holidays_test2',
'ds': pd.to_datetime([
'2018-08-01'
]),
})
events_df = pd.concat((holidays_test, holidays_test2))
test2 = test.rename(columns={'Datetime':'ds','AEP_MW':'y'})
test2['ds'] = test2['ds'].astype('datetime64[ns]')
test2['y'] = test2['y'].astype('float')
test2 = test2[(test2['ds'] > '2018')]
test2 = df_utils.add_missing_dates_nan(test2, freq='H')
test2 = test2[0]
test2= test2.fillna(0)
data = test2
period = 7*24
events = ['holidays_test']
model = NeuralProphet(
n_forecasts=period,n_lags=24*7,yearly_seasonality=False,weekly_seasonality=True,daily_seasonality=True,epochs=10,loss_func="MSE",
seasonality_mode="multiplicative")
model = model.add_events(['holidays_test'], mode="multiplicative")
model = model.add_events(['holidays_test2'], mode="multiplicative")
history_df = model.create_df_with_events(data, events_df)
metrics = model.fit(history_df, freq='H')
future = model.make_future_dataframe(df=history_df, events_df=events_df, n_historic_predictions=True)
forecast = model.predict(df = future)
results = utils.fcst_df_to_last_forecast(forecast, n_last=1)[-(model.n_forecasts+ model.n_lags):]
neuro_prophet_forecast_test = results[['ds','yhat1']][(-period):].reset_index(drop=True)
neuro_prophet_forecast_test['yhat1'] = neuro_prophet_forecast_test['yhat1'].astype('float')
neuro_prophet_forecast_test = neuro_prophet_forecast_test.round(0)
neuro_prophet_forecast_test = pd.DataFrame({
"date": neuro_prophet_forecast_test['ds'].dt.date,
"hour": neuro_prophet_forecast_test['ds'].dt.hour,
"n": neuro_prophet_forecast_test['yhat1'].astype(int)
})
forecast_test = neuro_prophet_forecast_test.pivot_table(index=['date'], columns=['hour'], values=['n'])
forecast_test.head(8)
Running this code and the simple version without the holidays it apparently gives the same result, always being hampered by the holiday dates (which are outliers).
If we give a history_df.head() we find that the number 1 appears only on lines with 00:00 and not added at other times.
As I said before, I tried to get around this by putting the 1 in another way in the other times but even so the model continues using the holiday data.
Thanks and sorry for my English
Hi @FBonke Thank you for clarifying this.
From what I can see, this is what may be happening:
- Your events occur over a 24hour window.
- You specify a date on which the event occurs.
- The code converts the date to a single timestamp at midnight of the day.
- The other timestamps of the targeted date remain non-events.
To solve this, try specifying an event occurrence for each timestamp in your event window. e.g. 24 datestamps for a full day of hourly events.
(Further, I recommend using a higher number of epochs (e.g. 50), and only using multiplicative modes if you have some reason to believe they have multiplicative nature.)
Hope this helps?
Hi @ourownstory
About: To solve this, try specifying an event occurrence for each timestamp in your event window. e.g. 24 datestamps for a full day of hourly events.
So I'm creating a concatenated dataframe as I showed a print in the initial problem, where the code converts to all other times. But even so it presents the same results being hampered by the holiday outlier.
The way I did it was concatenate this way before moving to the model:
And regarding the multiplicative data, we use it to predict purchases and per year there is always an increase in orders related to % keeping the seasonality, so the model ended up performing better this way.
Thanks for the concern, and I'll test it over the weekend with epoch at 50 to see if it gets even better. But the holiday remains a problem :(
@FBonke Sorry to hear that it did not resolve it.
If you share a more minimal piece of code that I can run myself to reproduce the issue, I can have a deeper look.
@ourownstory
holidays_sp = pd.DataFrame({
'event': 'holidays_sp',
'ds': pd.to_datetime([
'2020-01-25', '2021-01-25', '2022-01-25',
'2020-07-09', '2021-07-09', '2022-07-09',
'2020-11-20', '2021-11-20', '2022-11-20',
]),
})
holidays_nationals = pd.DataFrame({
'event': 'holidays_nationals',
'ds': pd.to_datetime([
'2020-02-24', '2021-02-15', '2022-02-28',
'2020-02-25', '2021-02-16', '2022-03-01',
'2020-02-26', '2021-02-17', '2022-03-02',
'2020-12-24', '2021-12-24', '2022-12-24',
'2020-12-25', '2021-12-25', '2022-12-25',
'2020-12-31', '2021-12-31', '2022-12-31',
'2021-01-01', '2022-01-01', '2023-01-01',
'2020-04-21', '2021-04-21', '2022-04-21',
'2020-05-01', '2021-05-01', '2022-05-01',
'2020-09-07', '2021-09-07', '2022-09-07',
'2020-10-12', '2021-10-12', '2022-10-12',
'2020-11-02', '2021-11-02', '2022-11-02',
'2020-11-15', '2021-11-15', '2022-11-15',
'2020-04-10', '2021-04-02', '2022-04-15',
'2020-04-12', '2021-04-04', '2022-04-17',
'2020-06-11', '2021-06-03', '2022-06-16',
]),
})
holidays_anticipations = pd.DataFrame({
'event': 'holidays_anticipations',
'ds': pd.to_datetime([
'2020-05-20', '2020-05-21', '2020-05-22',
'2020-05-25', '2021-03-26', '2021-03-29',
'2021-03-30', '2021-04-01',
]),
})
events_df = pd.concat((holidays_sp, holidays_nationals, holidays_anticipations))
def find_model_events(df, events):
return df.groupby('date', as_index=False).sum()[['date'] + events]
def enhance_events_df(df, events):
df['date'] = df['ds'].dt.date.astype(str)
model_events = find_model_events(df, events)
return df.drop(columns=events).merge(model_events, how='left', on='date').drop('date', axis=1)
jabaquara2 = df_utils.add_missing_dates_nan(jabaquara2, freq='H')
jabaquara2 = jabaquara2[0]
jabaquara2= jabaquara2.fillna(0)
data = jabaquara2
period = 7*24
events = ['holidays_sp', 'holidays_nationals', 'holidays_anticipations']
model = NeuralProphet(
n_forecasts=period,n_lags=24*7,yearly_seasonality=False,weekly_seasonality=True,daily_seasonality=True,epochs=10,loss_func="MSE",
seasonality_mode="multiplicative")
#model = model.add_events(events)
model = model.add_events(["holidays_sp"], mode="multiplicative")
model = model.add_events(["holidays_nationals"], mode="multiplicative")
model = model.add_events(["holidays_anticipations"], mode="multiplicative")
history_df = model.create_df_with_events(data, events_df)
history_df = enhance_events_df(history_df, events)
metrics = model.fit(history_df, freq='H')
future = model.make_future_dataframe(df=history_df, events_df=events_df, n_historic_predictions=True)
forecast = model.predict(df = future)
results = utils.fcst_df_to_last_forecast(forecast, n_last=1)[-(model.n_forecasts+ model.n_lags):]
neuro_prophet_forecast_jabaquara = results[['ds','yhat1']][(-period):].reset_index(drop=True)
If you need anything just let me know
If you need anything just let me know
Thanks! Do you mind also including the data or another dataset that I can use to reproduce? thank you
Hi @FBonke, do you still have that issue?
closed due to inactivity