tf-keras
tf-keras copied to clipboard
Scoring for metrics and losses,
Disclaimer: I'm an engineer by training, not a statistician, so I can get this wrong. I don't know that the general thrust has technical gaps, but I could be off in some of the details. If I am, please go talk to your friendly neighborhood stats professor and get their thoughts on the idea.
Background:
I hang out (and interact, and have for years, and learn tons) at CrossValidated, a stack-exchange site.
One of the very substantial families of threads there is why accuracy is not ideal in many places, and many of the folks engaged in the discussions are fantastic PhD's, in academia and industry, teaching or working for decades, so they are a very important source of technical wisdom.
Here are some of the threads there:
- https://stats.stackexchange.com/questions/359909/is-accuracy-an-improper-scoring-rule-in-a-binary-classification-setting
- https://stats.stackexchange.com/questions/357466/are-unbalanced-datasets-problematic-and-how-does-oversampling-purport-to-he
- https://stats.stackexchange.com/questions/312780/why-is-accuracy-not-the-best-measure-for-assessing-classification-models
- https://stats.stackexchange.com/questions/359909/is-accuracy-an-improper-scoring-rule-in-a-binary-classification-setting/359936#359936
- https://stats.stackexchange.com/questions/368949/example-when-using-accuracy-as-an-outcome-measure-will-lead-to-a-wrong-conclusio
They like these things called "strictly proper score functions" or "strictly proper scoring rules".
Here are references on strictly proper scoring rules:
- https://apps.dtic.mil/sti/pdfs/ADA459827.pdf
- (of course) https://en.wikipedia.org/wiki/Scoring_rule#Proper_scoring_rules
- (other folks in github have engaged them) https://github.com/mlr-org/mlr/issues/880
- https://sites.stat.washington.edu/raftery/Research/PDF/Gneiting2007jasa.pdf
- https://www.tensorflow.org/probability/api_docs/python/tfp/stats/brier_score
- https://xianblog.wordpress.com/2017/11/21/the-hyvarinen-score-is-back/
- https://faculty.missouri.edu/~merklee/pub/MerkleSteyvers2013.pdf
When I got to keras loss and metrics pages I don't see those scoring rules explicitly, and I think its a miss. I think some may be in there, but I must have missed them.
Current losses from documentation:
- binary/categorical cross-entropy (I think this is related to logloss)
- KL divergence
- Poisson class
Current metrics from documentation:
- Accuracy
- Binary/Categorical/TopK accuracy
- Binary/categorical crossentropy
- AUC/and the TF PN measures
Recommendation/Suggestion:
I think you should add the following "strictly proper scoring rules" to Keras because it can make it easier for new users (and their pointy-haired bosses) to use technically exemplary approaches in some of their problem solving.
Some rules to consider:
- Brier/quadratic scoring rule
- Hyvarinen scoring rule
- Spherical Scoring Rule
- Logarithmic scoring rule (log-probability)