bayesplot icon indicating copy to clipboard operation
bayesplot copied to clipboard

Updating residual plots

Open behramulukir opened this issue 6 months ago • 3 comments

In #343, @TeemuSailynoja highlighted the lack of residual plots, especially for discrete observations. It is about the fact that ppc_error_scatter_vs_x scatter doesn't work for discrete observations, and ppc_error_binned doesn't currently support covariates on the x-axis. I think, as he suggested, adding an optional x argument to ppc_error_binned, which would work similarly to ppc_interval, makes sense. That way, users will have the ability to plot residuals against x, but since it's optional, this change won't break any existing plots.

Another point regarding residual plots is having a new plot -possibly- named ppc_residual suggested by @jgabry and @avehtari at #349. That new function would plot y - stat(y_rep) on the y-axis and stat(y_rep) on the x-axis. This, again, is a good new plot to implement in my opinion since it gives users a chance to analyse a different aspect of the data.

I am willing to work on both of these functions and more if there is more to update regarding residual plots, however, I am not sure where to start since there are open PRs connected to residual plots.

behramulukir avatar Jul 02 '25 14:07 behramulukir

Yeah I think we should make new functions for these residual plots. We can merge #349 with @tjmahr's changes to ppc_error_scatter_avg and then create separate functions like @avehtari described in https://github.com/stan-dev/bayesplot/pull/349#issuecomment-2903407955.

jgabry avatar Jul 02 '25 16:07 jgabry

I'll clean up that commit, pull in from the latest release, etc.

tjmahr avatar Jul 02 '25 17:07 tjmahr

Multiple discussion points for this:

  1. If we want the ppc_residual_* functions to allow for discrete observations, should we then implement ppc_residual_binned() (like ppc_error_binned())?
  2. About the PAVA transformed residuals from the PPC-paper. The PAVA-residual plot is actually of the form stat(cep_y - p_pred) where cep_y is a matrix of conditional event probabilities obtained by PAVA transforming y based on the predictive probability samples in p_pred. So would this then actually be called ppc_error_pava() with the chosen function naming?

TeemuSailynoja avatar Jul 10 '25 08:07 TeemuSailynoja