scoringutils
scoringutils copied to clipboard
Implement ways to immediately warn when a forecast object becomes invalid
I see it has been discussed a bit in https://github.com/epiforecasts/scoringutils/issues/507 but I am uncomfortable with the fact we re-validate the class in the
print()method. Not just because it’s inefficient and non-standard, but mostly because it may be the symptom of a deeper issue. Once it has been created, we should ensure (to the best of our ability) that it remains valid no matter what. And the moment it stopps being valid, we should throw an error immediately, not wait for theprint()method to be called.This is an inherent weakness of S3 classes, which are loosely defined, but we can add safeguards. This could take the form of a custom
[method which ensures crucial columns are never dropped.
Originally posted by @Bisaloo in https://github.com/epiforecasts/scoringutils/pull/791#pullrequestreview-2003401135
Some ways in which a forecast object can become invalid:
- you delete one of the protected columns (
model,observed,predicted,quantile_level...) - you delete a column that is part of the forecast unit in a way that now a single forecast is not uniquely identified anymore (meaning that
get_duplicate_forecasts()would fail - you add additional columns (e.g. you end up with both a
quantile_leveland asample_idcolumn) - you
rbindadditional rows, leading to the same error - you delete rows such that none remain or that only rows remain that have an
NAvalue somewhere (meaning they get filtered out)
I'm not sure we'd be able to catch all of these immediately. What do you think @Bisaloo @seabbs @sbfnk
Also is this something we should do before the CRAN release, or possibly afterwards?
I am also not sure how we should do this. Suggestions @Bisaloo?
As far as I can tell, all the changes mentioned here are done via [, or a function that internally uses it (to be confirmed for rbind()).
So with a custom [.forecast() method, you should be able to immediately error when the object becomes invalid:
- Run
NextMethod()([.data.frame()in most/all cases) - Validate the object
- Error if invalid OR return is valid
Also is this something we should do before the CRAN release, or possibly afterwards?
I'd say it's strictly speaking a breaking change but it doesn't really change anything in the API so it can possibly wait.
@Bisaloo would you possibly be willing to take on this PR? I imagine you'd by far be best placed to tackle it.
I'm fully booked until mid-July but could have a go after. Would this work with your timeline?
Yes, that would be perfect, merci beaucoup!
@Bisaloo what does your current schedule look like, would you have a chance to look at this again? Thank you!
I had a quick look last week but got blocked by an infinite loop of functions calling one another.
I'll try to have another look on Friday if possible :crossed_fingers:
Merci!
@Bisaloo what do you think about the next CRAN release? My current understanding is that this is something we should address before the next release, is that right?
I think it's probably slightly better indeed as it's likely to cause subtle breaking changes, which would fit well under 2.0.0.
Note that submission to CRAN is closed for the next two weeks anyways for the summer holiday & maintenance.
A nice additional benefit of this: we can get rid of the function validate_forecast(). The function is essentially the same as assert_forecast(), but returns the input data instead of an invisible NULL.
validate_forecast <- function(forecast, forecast_type = NULL, verbose = TRUE) {
assert_forecast(forecast, forecast_type, verbose)
return(forecast)
}
I think this is done now thanks to @Bisaloo! Please reopen if not fully done.