mlr3proba icon indicating copy to clipboard operation
mlr3proba copied to clipboard

(wish)list of probabilistic regression losses to implement

Open fkiraly opened this issue 6 years ago • 4 comments

probabilistic regression

  • [x] log-loss aka cross-entropy loss, for continuous distr
  • [ ] squared (integrated) loss for continuous distr
  • [ ] integrated Brier score aka rank probability score, for any real distr

interval prediction

  • [ ] quantile interval score aka double Pinball loss, and normalized versions
  • [ ] interval length; within-interval-rate (warning: not proper)
  • [ ] mean-variance likelihood score

fkiraly avatar Oct 31 '19 22:10 fkiraly

:+1: for brier scores.

mllg avatar Nov 01 '19 19:11 mllg

With regards to Brier scores, we need to decide about how to calculate numerical estimates if the required analytical norm isn't accessible. Two solutions

  1. Implement a good numerical estimator for the Brier score that doesn't rely on the norm but instead estimates integration over all time-points (i.e. like pec)
  2. Implement a good numerical estimator in distr6.

The latter option seems a better solution to me as then mlr3proba doesn't have to deal with problems of numerical estimation, which distr6 does anyway

RaphaelS1 avatar Nov 02 '19 10:11 RaphaelS1

Let's just use the analytic version which uses integrated cdf and various 2-norms, and produce a warning (heuristic approximate, potentially unreliable) if these are not available.

In any case, I strongly prefer leaving whichever approximations with distr6 rather than mlr3proba where from a functionality/modularization perspective it has no business to be in my opinion.

fkiraly avatar Nov 02 '19 11:11 fkiraly

PS: I assume we are all talking about integrated Brier score aka rank probability score here.

fkiraly avatar Nov 02 '19 11:11 fkiraly