scoringutils icon indicating copy to clipboard operation
scoringutils copied to clipboard

Implement ways to immediately warn when a forecast object becomes invalid

Open nikosbosse opened this issue 1 year ago • 6 comments

I see it has been discussed a bit in https://github.com/epiforecasts/scoringutils/issues/507 but I am uncomfortable with the fact we re-validate the class in the print() method. Not just because it’s inefficient and non-standard, but mostly because it may be the symptom of a deeper issue. Once it has been created, we should ensure (to the best of our ability) that it remains valid no matter what. And the moment it stopps being valid, we should throw an error immediately, not wait for the print() method to be called.

This is an inherent weakness of S3 classes, which are loosely defined, but we can add safeguards. This could take the form of a custom [ method which ensures crucial columns are never dropped.

Originally posted by @Bisaloo in https://github.com/epiforecasts/scoringutils/pull/791#pullrequestreview-2003401135

nikosbosse avatar May 19 '24 07:05 nikosbosse

Some ways in which a forecast object can become invalid:

  • you delete one of the protected columns (model, observed, predicted, quantile_level...)
  • you delete a column that is part of the forecast unit in a way that now a single forecast is not uniquely identified anymore (meaning that get_duplicate_forecasts() would fail
  • you add additional columns (e.g. you end up with both a quantile_level and a sample_id column)
  • you rbind additional rows, leading to the same error
  • you delete rows such that none remain or that only rows remain that have an NA value somewhere (meaning they get filtered out)

I'm not sure we'd be able to catch all of these immediately. What do you think @Bisaloo @seabbs @sbfnk

Also is this something we should do before the CRAN release, or possibly afterwards?

nikosbosse avatar May 19 '24 08:05 nikosbosse

I am also not sure how we should do this. Suggestions @Bisaloo?

seabbs avatar May 20 '24 21:05 seabbs

As far as I can tell, all the changes mentioned here are done via [, or a function that internally uses it (to be confirmed for rbind()).

So with a custom [.forecast() method, you should be able to immediately error when the object becomes invalid:

  1. Run NextMethod() ([.data.frame() in most/all cases)
  2. Validate the object
  3. Error if invalid OR return is valid

Also is this something we should do before the CRAN release, or possibly afterwards?

I'd say it's strictly speaking a breaking change but it doesn't really change anything in the API so it can possibly wait.

Bisaloo avatar May 21 '24 10:05 Bisaloo

@Bisaloo would you possibly be willing to take on this PR? I imagine you'd by far be best placed to tackle it.

nikosbosse avatar Jun 02 '24 15:06 nikosbosse

I'm fully booked until mid-July but could have a go after. Would this work with your timeline?

Bisaloo avatar Jun 03 '24 06:06 Bisaloo

Yes, that would be perfect, merci beaucoup!

nikosbosse avatar Jun 13 '24 18:06 nikosbosse

@Bisaloo what does your current schedule look like, would you have a chance to look at this again? Thank you!

nikosbosse avatar Jul 24 '24 20:07 nikosbosse

I had a quick look last week but got blocked by an infinite loop of functions calling one another.

I'll try to have another look on Friday if possible :crossed_fingers:

Bisaloo avatar Jul 30 '24 09:07 Bisaloo

Merci!

nikosbosse avatar Jul 30 '24 10:07 nikosbosse

@Bisaloo what do you think about the next CRAN release? My current understanding is that this is something we should address before the next release, is that right?

nikosbosse avatar Aug 04 '24 15:08 nikosbosse

I think it's probably slightly better indeed as it's likely to cause subtle breaking changes, which would fit well under 2.0.0.

Note that submission to CRAN is closed for the next two weeks anyways for the summer holiday & maintenance.

Bisaloo avatar Aug 05 '24 08:08 Bisaloo

A nice additional benefit of this: we can get rid of the function validate_forecast(). The function is essentially the same as assert_forecast(), but returns the input data instead of an invisible NULL.

validate_forecast <- function(forecast, forecast_type = NULL, verbose = TRUE) {
  assert_forecast(forecast, forecast_type, verbose)
  return(forecast)
}

nikosbosse avatar Aug 10 '24 08:08 nikosbosse

I think this is done now thanks to @Bisaloo! Please reopen if not fully done.

nikosbosse avatar Sep 11 '24 19:09 nikosbosse