mlr3proba
mlr3proba copied to clipboard
(wish)list of probabilistic regression losses to implement
probabilistic regression
- [x] log-loss aka cross-entropy loss, for continuous distr
- [ ] squared (integrated) loss for continuous distr
- [ ] integrated Brier score aka rank probability score, for any real distr
interval prediction
- [ ] quantile interval score aka double Pinball loss, and normalized versions
- [ ] interval length; within-interval-rate (warning: not proper)
- [ ] mean-variance likelihood score
:+1: for brier scores.
With regards to Brier scores, we need to decide about how to calculate numerical estimates if the required analytical norm isn't accessible. Two solutions
- Implement a good numerical estimator for the Brier score that doesn't rely on the norm but instead estimates integration over all time-points (i.e. like
pec) - Implement a good numerical estimator in
distr6.
The latter option seems a better solution to me as then mlr3proba doesn't have to deal with problems of numerical estimation, which distr6 does anyway
Let's just use the analytic version which uses integrated cdf and various 2-norms, and produce a warning (heuristic approximate, potentially unreliable) if these are not available.
In any case, I strongly prefer leaving whichever approximations with distr6 rather than mlr3proba where from a functionality/modularization perspective it has no business to be in my opinion.
PS: I assume we are all talking about integrated Brier score aka rank probability score here.