skforecast
skforecast copied to clipboard
Good First Issue: Allow `predict` method to accept date values as `steps`
Use branch 0.14.x as base.
Summary
Currently, the steps
parameter in all Forecasters' predict
methods only accepts an integer value. This integer defines how many observations to forecast into the future. We would like to extend this functionality so that steps
can also accept a date (e.g., '2020-01-01'
). If a date is provided, the function should calculate the appropriate number of observations corresponding to the time window between the last observation in the last window and the given date.
Task
- Create an auxiliary function,
_preprocess_steps_as_date(last_window: pd.Series, steps)
in theutils
module:
-
last_window
is the last window of the series used to forecast the future. This is an argument of thepredict
method in all Forecasters. -
steps
can be an integer or any datetime format that pandas allows to be passed to apd.DatetimeIndex
(e.g., string, pandas timestamp...). - If the Forecaster was not fitted using a
pd.DatetimeIndex
, raise aTypeError
with the message: "If the Forecaster was not fitted using a pd.DatetimeIndex,steps
must be an integer." - If the Forecaster was fitted using a
pd.DatetimeIndex
, this function will return the length of the time window between the last observation in the last window and the given date as an integer value. - If the input
steps
is an integer, return the same integer. - Create unit tests using pytest in the
utils.tests
folder.
# Expected behavior
# ==============================================================================
last_window = pd.Series([1, 2, 3, 4, 5], index=pd.date_range('2020-01-01', periods=5, freq='D'))
_preprocess_steps_as_date(last_window, '2020-01-07') # expected output: 2
last_window = pd.Series([1, 2, 3, 4, 5], index=pd.date_range('2020-01-01', periods=5, freq='D'))
_preprocess_steps_as_date(last_window, 2) # expected output: 2
last_window = pd.Series([1, 2, 3, 4, 5], index=pd.RangeIndex(start=0, stop=5, step=1))
_preprocess_steps_as_date(last_window, '2020-01-07') # expected output: TypeError
- Integrate this function in the
predict
method of theForecasterAutoreg
class.
Acceptance Criteria
- [ ] The
steps
parameter accepts both integer and date formats. - [ ] The function correctly calculates the number of steps when a date is provided.
- [ ] Existing tests continue to pass.
- [ ] New test cases are added to verify the correct behavior for both int and date inputs.
Full Example
# Expected behavior
# ==============================================================================
data = fetch_dataset(name="h2o", kwargs_read_csv={"names": ["y", "datetime"], "header": 0})
steps = 36
data_train = data[:-steps]
data_test = data[-steps:]
forecaster = ForecasterAutoreg(
regressor = LGBMRegressor(random_state=123, verbose=-1),
lags = 15
)
forecaster.fit(y=data_train['y'])
predictions = forecaster.predict(steps='2005-09-01') # As steps=3
2005-07-01 1.020833 2005-08-01 1.021721 2005-09-01 1.093488 Freq: MS, Name: pred, dtype: float64