skforecast icon indicating copy to clipboard operation
skforecast copied to clipboard

Good First Issue: Allow `predict` method to accept date values as `steps`

Open JavierEscobarOrtiz opened this issue 4 months ago • 4 comments

Use branch 0.14.x as base.

Summary

Currently, the steps parameter in all Forecasters' predict methods only accepts an integer value. This integer defines how many observations to forecast into the future. We would like to extend this functionality so that steps can also accept a date (e.g., '2020-01-01'). If a date is provided, the function should calculate the appropriate number of observations corresponding to the time window between the last observation in the last window and the given date.

Task

  1. Create an auxiliary function, _preprocess_steps_as_date(last_window: pd.Series, steps) in the utils module:
  • last_window is the last window of the series used to forecast the future. This is an argument of the predict method in all Forecasters.
  • steps can be an integer or any datetime format that pandas allows to be passed to a pd.DatetimeIndex (e.g., string, pandas timestamp...).
  • If the Forecaster was not fitted using a pd.DatetimeIndex, raise a TypeError with the message: "If the Forecaster was not fitted using a pd.DatetimeIndex, steps must be an integer."
  • If the Forecaster was fitted using a pd.DatetimeIndex, this function will return the length of the time window between the last observation in the last window and the given date as an integer value.
  • If the input steps is an integer, return the same integer.
  • Create unit tests using pytest in the utils.tests folder.
# Expected behavior
# ==============================================================================
last_window = pd.Series([1, 2, 3, 4, 5], index=pd.date_range('2020-01-01', periods=5, freq='D'))
_preprocess_steps_as_date(last_window, '2020-01-07') # expected output: 2

last_window = pd.Series([1, 2, 3, 4, 5], index=pd.date_range('2020-01-01', periods=5, freq='D'))
_preprocess_steps_as_date(last_window, 2) # expected output: 2

last_window = pd.Series([1, 2, 3, 4, 5], index=pd.RangeIndex(start=0, stop=5, step=1))
_preprocess_steps_as_date(last_window, '2020-01-07') # expected output: TypeError
  1. Integrate this function in the predict method of the ForecasterAutoreg class.

Acceptance Criteria

  • [ ] The steps parameter accepts both integer and date formats.
  • [ ] The function correctly calculates the number of steps when a date is provided.
  • [ ] Existing tests continue to pass.
  • [ ] New test cases are added to verify the correct behavior for both int and date inputs.

Full Example

# Expected behavior
# ==============================================================================
data = fetch_dataset(name="h2o", kwargs_read_csv={"names": ["y", "datetime"], "header": 0})

steps = 36
data_train = data[:-steps]
data_test  = data[-steps:]

forecaster = ForecasterAutoreg(
                 regressor = LGBMRegressor(random_state=123, verbose=-1),
                 lags      = 15 
             )
forecaster.fit(y=data_train['y'])

predictions = forecaster.predict(steps='2005-09-01') # As steps=3

2005-07-01 1.020833 2005-08-01 1.021721 2005-09-01 1.093488 Freq: MS, Name: pred, dtype: float64

JavierEscobarOrtiz avatar Oct 08 '24 07:10 JavierEscobarOrtiz