Updating residual plots
In #343, @TeemuSailynoja highlighted the lack of residual plots, especially for discrete observations. It is about the fact that ppc_error_scatter_vs_x scatter doesn't work for discrete observations, and ppc_error_binned doesn't currently support covariates on the x-axis. I think, as he suggested, adding an optional x argument to ppc_error_binned, which would work similarly to ppc_interval, makes sense. That way, users will have the ability to plot residuals against x, but since it's optional, this change won't break any existing plots.
Another point regarding residual plots is having a new plot -possibly- named ppc_residual suggested by @jgabry and @avehtari at #349. That new function would plot y - stat(y_rep) on the y-axis and stat(y_rep) on the x-axis. This, again, is a good new plot to implement in my opinion since it gives users a chance to analyse a different aspect of the data.
I am willing to work on both of these functions and more if there is more to update regarding residual plots, however, I am not sure where to start since there are open PRs connected to residual plots.
Yeah I think we should make new functions for these residual plots. We can merge #349 with @tjmahr's changes to ppc_error_scatter_avg and then create separate functions like @avehtari described in https://github.com/stan-dev/bayesplot/pull/349#issuecomment-2903407955.
I'll clean up that commit, pull in from the latest release, etc.
Multiple discussion points for this:
- If we want the
ppc_residual_*functions to allow for discrete observations, should we then implementppc_residual_binned()(likeppc_error_binned())? - About the PAVA transformed residuals from the PPC-paper. The PAVA-residual plot is actually of the form
stat(cep_y - p_pred)wherecep_yis a matrix of conditional event probabilities obtained by PAVA transformingybased on the predictive probability samples inp_pred. So would this then actually be calledppc_error_pava()with the chosen function naming?