[ENH] forecasting benchmarking task experiment
This PR adds a SktimeForecastingTask, which defines a full benchmarking run for a forecaster that is passed later in _evaluate.
This object could be used as a "task" in the sktime ForecastingBenchmark.
Draft for discussion and reviewing the design:
- it is quite similar to and partially duplicative with
SktimeForecastingExperimentwhich is used in tuning. How should we deal with the similarity and intersection?- we could merge into a single class, depending on whether
forecastergets passed or not. Not sure where that leads though
- we could merge into a single class, depending on whether
- is this a possible 1:1 dropin (or almost) for the task object in
sktime?
@arnavk23, can you kindly explain what you corrected and why?
@arnavk23, can you kindly explain what you corrected and why?
-
Added validation for forecaster in params The original version assumed params["forecaster"] always existed. I added an explicit check and a clear error message because missing/incorrect parameters otherwise raise cryptic errors deeper inside sktime.evaluate.
-
Made scoring metric handling more robust The previous code assumed that any scoring object implements get_tag("lower_is_better"). I wrapped this in a try/except and added correct defaults for both cases (scoring=None or custom metrics).
-
Safely applied higher_is_better tag set_tags() was called without handling the case where it fails or is not supported.
-
Improved parsing of the output from sktime.evaluate() The previous implementation assumed: the result is always a DataFrame the scoring column name is always exactly "test_<scoring.name>" I added: support for both DataFrame-like and dict-like outputs fallback to the first available test_* column if the expected name isn’t present, warnings when fallback happens.
-
Better error handling during evaluate Previously, any exception inside evaluate() could crash or create inconsistent behavior. Now: error_score="raise" preserves the expected behavior otherwise returns (error_score, {"error":
}) -
Robust conversion of results to a scalar The earlier implementation assumed you can always do float(results.mean()). I added: use of np.nanmean fallback to np.asarray if needed structured error reporting if even that fails
@arnavk23, is this AI generated?
@arnavk23, is this AI generated?
Yes the remark is AI-generated.