Hyperactive icon indicating copy to clipboard operation
Hyperactive copied to clipboard

[ENH] forecasting benchmarking task experiment

Open fkiraly opened this issue 4 months ago • 4 comments

This PR adds a SktimeForecastingTask, which defines a full benchmarking run for a forecaster that is passed later in _evaluate.

This object could be used as a "task" in the sktime ForecastingBenchmark.

Draft for discussion and reviewing the design:

  • it is quite similar to and partially duplicative with SktimeForecastingExperiment which is used in tuning. How should we deal with the similarity and intersection?
    • we could merge into a single class, depending on whether forecaster gets passed or not. Not sure where that leads though
  • is this a possible 1:1 dropin (or almost) for the task object in sktime?

fkiraly avatar Aug 24 '25 14:08 fkiraly

@arnavk23, can you kindly explain what you corrected and why?

fkiraly avatar Nov 22 '25 11:11 fkiraly

@arnavk23, can you kindly explain what you corrected and why?

  1. Added validation for forecaster in params The original version assumed params["forecaster"] always existed. I added an explicit check and a clear error message because missing/incorrect parameters otherwise raise cryptic errors deeper inside sktime.evaluate.

  2. Made scoring metric handling more robust The previous code assumed that any scoring object implements get_tag("lower_is_better"). I wrapped this in a try/except and added correct defaults for both cases (scoring=None or custom metrics).

  3. Safely applied higher_is_better tag set_tags() was called without handling the case where it fails or is not supported.

  4. Improved parsing of the output from sktime.evaluate() The previous implementation assumed: the result is always a DataFrame the scoring column name is always exactly "test_<scoring.name>" I added: support for both DataFrame-like and dict-like outputs fallback to the first available test_* column if the expected name isn’t present, warnings when fallback happens.

  5. Better error handling during evaluate Previously, any exception inside evaluate() could crash or create inconsistent behavior. Now: error_score="raise" preserves the expected behavior otherwise returns (error_score, {"error": })

  6. Robust conversion of results to a scalar The earlier implementation assumed you can always do float(results.mean()). I added: use of np.nanmean fallback to np.asarray if needed structured error reporting if even that fails

arnavk23 avatar Nov 22 '25 12:11 arnavk23

@arnavk23, is this AI generated?

fkiraly avatar Nov 28 '25 00:11 fkiraly

@arnavk23, is this AI generated?

Yes the remark is AI-generated.

arnavk23 avatar Nov 28 '25 00:11 arnavk23