yardstick
yardstick copied to clipboard
Feature request: Supporting Brier scores (and decompositions) for two-class outcomes
Dear tidymodels team,
I wanted to request the addition of Brier scores for measurement of performance for two-class outcome models.
The score is described here, along with 2-variable and 3-variable decompositions: https://en.wikipedia.org/wiki/Brier_score
I saw here that a Brier score was added, but I believe this was for survival outcomes only.
Thank you so much for the consideration and let me know if I can help in contributing these.
All the best,
Karandeep
It looks straightforward to implement the binary case, and then the multiclass case is the "original" definition by brier, so there would not be macro/micro extensions. I would probably end up calling it brier_score()
Regarding decomposition, they look to be mainly for binary classifiers. Maybe we can have something like brier_score_decomp(data, truth, estimate, terms = 2 or 3)
. It wouldn't be a typical metric function, but would just be useful for looking at the decomposed values.
Thanks Davis. From my standpoint, that sounds like exactly what I am looking for.
For multiclass situations, the Brier score can sum to more than 1, right (depending on the number of classes)?
I'm a little surprised there's apparently no way to just apply one of the regression metrics to a classification model's predicted class probabilities.
Has there been any progress on this? There is quite a bit of statistical literature that, to put it colloquially, says that just about every measure based on the classification matrix is rotten and that Brier scores rock. (E.g., see https://www.fharrell.com/post/class-damage/). I don't understand much of the underlying statistics, but I do understand some of the key issues concerning that strictly proper scoring rules like the Brier score are immune to gaming, so a better Brier score always means a better model, which is not always the case for measures like AUROC.
Will the Brier score be added to yardstick soon?