This is my work in progress of the pava calibration plots discussed in #343

Currently implemented:

ppc_calibration_overlay()
ppc_calibration_overlay_grouped()
ppc_calibration()
ppc_calibration_grouped()
.ppc_calibration_data() - internal function

Needs:

[x] Fast example to test functions.
[ ] Fix intervals in ppc_calibration()
[x] Also example use in documentation
[x] LOO versions
[ ] Should .ppc_calibration_data() be exposed to users?
[x] tests
[ ] check that the input parameter names and default values make sense and are intuitive
[ ] Add documentation and comments to the code also.

May 19 '25 15:05 TeemuSailynoja

Codecov Report

Attention: Patch coverage is 0% with 134 lines in your changes missing coverage. Please review.

Project coverage is 96.35%. Comparing base (527c48c) to head (14eb2dc).

Files with missing lines	Patch %	Lines
R/ppc-calibration.R	0.00%	134 Missing :warning:

Additional details and impacted files

@@            Coverage Diff             @@
##           master     #352      +/-   ##
==========================================
- Coverage   98.60%   96.35%   -2.25%     
==========================================
  Files          35       36       +1     
  Lines        5650     5784     +134     
==========================================
+ Hits         5571     5573       +2     
- Misses         79      211     +132

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

:rocket: New features to boost your workflow:

:snowflake: Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

May 19 '25 15:05 codecov-commenter

Examples

These should allow for some tests of these functions.

Creating example data

library(bayesplot)
ymin <- range(example_y_data(), example_yrep_draws())[1]
ymax <- range(example_y_data(), example_yrep_draws())[2]
# Observations and posterior predictive probabilitites.
y <- rbinom(length(example_y_data()), 1, (example_y_data() - ymin) / (ymax - ymin))
prep <- (example_yrep_draws() - ymin) / (ymax - ymin)
groups <- example_group_data()

PAVA Calibration overlay

Basic

ppc_calibration_overlay(y, prep[1:50,])

Grouped

ppc_calibration_overlay_grouped(y, prep[1:50,], groups)

PAVA Calibration

This isn't yet quite what we want. Now the interval is not what we show in the paper. There, we use consistency intervals, that is, intervals centered at the diagonal displaying, where the calibration curve should lie, i.e. the posterior mean should stay within these bounds. In this implementation, I'm plotting a confidence interval, which shows, where we think the curve lies, i.e. the diagonal should be included.

ppc_calibration(y, prep)

ppc_calibration_grouped(y, prep, groups)

May 22 '25 12:05 TeemuSailynoja

This all sounds good, thanks @TeemuSailynoja. I made a few small review comments/questions. In addition to those questions, when you say

This isn't yet quite what we want. Now the interval is not what we show in the paper.

you mean that we will want to change this to use the consistency intervals you use in the paper, right? Do you think it's at all useful to give the user the option to choose which kind of interval? Or just strictly better to use the consistency intervals? I hadn't really thought about that.

Leaving an option to choose is perhaps the best, as long as the difference is explained in the documentation.

Confidence = "Where do we think the calibration curve for our model lies." Consistency = "Where should the curve of a consistent model lie."

Jun 03 '25 12:06 TeemuSailynoja

Ok great, thanks for the replies. Sounds good to me.

Jun 03 '25 16:06 jgabry

PPC Calibration plots

Currently implemented:

Needs:

Codecov Report

Examples

Creating example data

PAVA Calibration overlay

PAVA Calibration