tennis-crystal-ball icon indicating copy to clipboard operation
tennis-crystal-ball copied to clipboard

Question on backtesting of predictions

Open larssl780 opened this issue 6 years ago • 5 comments

Great tool - do you have any stats on how the prediction model performs on different surfares, rounds etc? Basically, on average if the model says odds of a player winning is 2, does that player win 50% of the time?

larssl780 avatar Jul 17 '18 13:07 larssl780

Here is the current state, tested on a sample seasons 2005 to 2018 (May/June)

HARD_OUTDOOR PredictionResult{rate=71.104%, predictable=96.518%, brier=0.19026, logLoss=0.56009, score=1.26950, calibration=0.98534, matches=15769}

CLAY PredictionResult{rate=69.526%, predictable=98.591%, brier=0.19893, logLoss=0.58052, score=1.19763, calibration=0.97839, matches=12914}

GRASS PredictionResult{rate=71.279%, predictable=95.711%, brier=0.19000, logLoss=0.56030, score=1.27215, calibration=0.99381, matches=3987}

HARD INDOOR / CARPET PredictionResult{rate=68.879%, predictable=98.231%, brier=0.20350, logLoss=0.59259, score=1.16233, calibration=0.99636, matches=6274}

For your question about probability 50% (odds 2) and if outcome is really 50-50%:

This is answered by 'calibration' attribute, the closed it is to 1, the evener is the distribution of outcomes compared to predicted probabilities. If it is lower then 1, that means favorites are a little bit underestimated, if it is over 1, that it meas favorites are overestimated.

As it can be seen, grass and hard outdoor are most predictable, while clay and hard indoor are less. However, hard indoor score is skewed because there is small amount of best-of-5 matches on hard indoor (i.e. no GSs, just DC). Usually, best-of-5 gives about 8% more prediction accuracy compared to best-of-3.

mcekovic avatar Jul 18 '18 10:07 mcekovic

Best-Of Hard Outdoor Clay Grass Hard Indoor / Carpet Overall
3 68.822% 67.813% 68.738% 68.671% 68.422%
5 76.597% 76.118% 75.918% 76.582% 76.336%

mcekovic avatar Jul 18 '18 12:07 mcekovic

TODO: Add a blog entry with more info about the TCB predictor performance and tuning.

mcekovic avatar Aug 01 '18 08:08 mcekovic

What would also be cool is to compare your backdated odds with actual outcomes. So say eg. you created bins for odds between 1.5 and 1.52, and then you compare the actual outcomes (so you'd expect to see a player with those odds to win between ~65.8% and ~66.7% of the time if the odds are "fair").

larssl780 avatar Aug 01 '18 16:08 larssl780

This kind of metrics is measured with Log-Loss and Brier Score, some info is above.

mcekovic avatar Aug 08 '18 08:08 mcekovic