scoringutils icon indicating copy to clipboard operation
scoringutils copied to clipboard

Create a wrapper around log score to warn about its use for integer-valued forecasts

Open nikosbosse opened this issue 1 year ago • 7 comments

In the old version, we don't compute a log score for discrete sample-based forecasts. The reason for that is that the scoringRules implementation of the log score estimates a density, which is difficult for discrete forecasts. Naturally, there could be different ways, so instead of estimating a density, you would have an actual probability assigned to every discrete value. I'm not sure how to do this (in some sense, it's the same as multiclass classification with a lot of classes?). And we don't currently have code for that.

Options seem to be

  1. compute log score for discrete forecasts anyway
  2. don't compute log score for discrete forecasts
  3. come up with some implementation

nikosbosse avatar Oct 26 '23 17:10 nikosbosse

@sbfnk Do you have thoughts on this?

nikosbosse avatar Nov 14 '23 22:11 nikosbosse

Make it something people can do themselves if they want and are aware of the potential issues? I.e it's not in the default list but it's mentioned you could if you like risk?

seabbs avatar Nov 19 '23 11:11 seabbs

Hm currently we're not distinguishing between integer and continuous in our methods - i.e. there is only one score.forecasts_sample

  • one could make a case against using the log score based on samples completely (as you'd have to estimate a density). But if we want to distinguish between discrete and continuous forecasts we'd have to think about how to do that

nikosbosse avatar Nov 19 '23 14:11 nikosbosse

I think we can push this into future nice to have releases?

seabbs avatar Nov 27 '23 16:11 seabbs

Is there an underlying issue that representation of probability distributions (sample, quantiles, analytical etc.) and types of outcomes (continuous, binary, integer) are orthogonal and both need to be known before deciding how to score, but we sometimes make a decision based only on one of these pieces of information?

sbfnk avatar Nov 27 '23 18:11 sbfnk

I think @sbfnk you're right there is such an issue in this case (and maybe a few others, e.g. when constructing PIT histograms, or in the bias metric). Usually, I think this can be handled by the function itself. I.e. the function is called based on the representation of the probability distribution and then acts based on continuous/discrete (the binary case is handled automatically as we're treating binary as a different representation)

In this specific instance, however, the function we are using is one from scoringRules.

We could create a wrapper around scoringRules::logs_sample() which checks whether the forecasts are discrete and produce a warning in that case.

@seabbs we could push this to a future release if we're happy with users computing a log score which is only moderately appropriate. I could also live with that.

nikosbosse avatar Nov 29 '23 13:11 nikosbosse

We could create a wrapper around scoringRules::logs_sample() which checks whether the forecasts are discrete and produce a warning in that case.

This seems like a good idea as an MVP.

seabbs avatar Nov 29 '23 13:11 seabbs