yardstick icon indicating copy to clipboard operation
yardstick copied to clipboard

Feature request: Supporting Brier scores (and decompositions) for two-class outcomes

Open kdpsingh opened this issue 5 years ago • 4 comments

Dear tidymodels team,

I wanted to request the addition of Brier scores for measurement of performance for two-class outcome models.

The score is described here, along with 2-variable and 3-variable decompositions: https://en.wikipedia.org/wiki/Brier_score

I saw here that a Brier score was added, but I believe this was for survival outcomes only.

Thank you so much for the consideration and let me know if I can help in contributing these.

All the best,

Karandeep

kdpsingh avatar Feb 04 '20 05:02 kdpsingh

It looks straightforward to implement the binary case, and then the multiclass case is the "original" definition by brier, so there would not be macro/micro extensions. I would probably end up calling it brier_score()

Regarding decomposition, they look to be mainly for binary classifiers. Maybe we can have something like brier_score_decomp(data, truth, estimate, terms = 2 or 3). It wouldn't be a typical metric function, but would just be useful for looking at the decomposed values.

DavisVaughan avatar Feb 04 '20 17:02 DavisVaughan

Thanks Davis. From my standpoint, that sounds like exactly what I am looking for.

For multiclass situations, the Brier score can sum to more than 1, right (depending on the number of classes)?

kdpsingh avatar Feb 04 '20 17:02 kdpsingh

I'm a little surprised there's apparently no way to just apply one of the regression metrics to a classification model's predicted class probabilities.

jdonland avatar Apr 09 '22 04:04 jdonland

Has there been any progress on this? There is quite a bit of statistical literature that, to put it colloquially, says that just about every measure based on the classification matrix is rotten and that Brier scores rock. (E.g., see https://www.fharrell.com/post/class-damage/). I don't understand much of the underlying statistics, but I do understand some of the key issues concerning that strictly proper scoring rules like the Brier score are immune to gaming, so a better Brier score always means a better model, which is not always the case for measures like AUROC.

Will the Brier score be added to yardstick soon?

tripartio avatar Nov 22 '22 14:11 tripartio