bayesplot icon indicating copy to clipboard operation
bayesplot copied to clipboard

add quantile dot plot functions

Open behramulukir opened this issue 6 months ago • 6 comments

As I mentioned at #354 I have been working on adding quantile dot plots. It was more complicated than I was expecting, so I wanted to make sure about the direction of this.

Current Progress

So far, I have implemented two functions, ppc_qdotplot which under the hood utilises geom_dotplot and ppc_qdotplot_ggdist which utilises stat_dots from ggdist.

qdotplot

This function utilises geom_dotplot, which gives a very simple way to build dot plots. geom_dotplot function has two main deficiencies:

  1. Very often, the number of dots exceeds the space that the graphic has, and since geom_dotplot doesn't have a proper auto-sizing for dots, the visual is getting clipped and not presenting the full picture.
  2. It doesn't have functionality to plot according to quantiles.

To minimise the first problem, I implemented a simple auto-sizing algorithm which performs well when the bin width is not specified or very reasonably specified; however, when it is even a little unsuitable, it is likely to run into issues. Here are some example uses of ppc_qdotplot:

color_scheme_set("brightblue")
y <- example_y_data()
yrep <- example_yrep_draws()
group <- example_group_data()
ppc_qdotplot(y, yrep[1:8, ])

plot1

ppc_qdotplot(y, yrep[1:8, ], binwidth = 5)

plot2

ppc_qdotplot(y, yrep[1:8, ], binwidth = 3)

plot6

ppc_qdotplot(y, yrep[1:8, ], binwidth = 10)

plot3

As you can see, for certain bin widths, dots are over-compressed or exceed the limit of the graphic.

qdotplot_ggdist

The second alternative relies on stat_dots from ggdist. It is slightly more successful overall in handling edge cases and has a warning system; however, it also brings additional dependency to the package, making me unsure if we want to go that route. The advantages of this over qdotplot are:

  1. When dots are unfit for the graph size, it gives a warning to the user
  2. It can plot based on quantiles
ppc_qdotplot_ggdist(y, yrep[1:8, ])

plot4

ppc_qdotplot_ggdist(y, yrep[1:8, ], binwidth = 3)

plot7

ppc_qdotplot_ggdist(y, yrep[1:8, ], binwidth = 5)

plot5

ppc_qdotplot_ggdist(y, yrep[1:8, ], binwidth = 10)

plot8

In the previous three graphics where the dots do not fit into the visual, the following error is thrown at the console:

Warning messages:
1: The provided binwidth will cause dots to overflow the boundaries of the geometry.
→ Set `binwidth = NA` to automatically determine a binwidth that ensures dots fit within the bounds,
→ OR set `overflow = "compress"` to automatically reduce the spacing between dots to ensure the dots fit within the bounds,
→ OR set `overflow = "keep"` to allow dots to overflow the bounds of the geometry without producing a warning.

ppc_qdotplot_ggdist can be used with quantiles as follows:

ppc_qdotplot_ggdist(y, yrep[1:8, ], quantile = 25)

plot9

Future

To move forward, I think we need to decide on which function we want to keep and whether we want to add ggdist dependency. After that, I can remove the unnecessary function and continue with the rest of the implementation.

  • [X] implement geom_dotplot version
  • [X] implement stat_dots
  • [x] decide on which function to keep
  • [x] implement the ppd version
  • [x] update the documentation
  • [x] implement tests

behramulukir avatar Jun 20 '25 15:06 behramulukir

I didn't realize there would be this sort of difficulty with the dot plots. I think handling those edge cases and having appropriate warnings is probably enough of a benefit to add ggdist dependency and maybe we can also use ggdist for other plots in the future. For now we could just add it to Suggests (instead of Imports) and then use our internal suggested_package() function to make sure it's installed if someone wants to make the dot plots. If we end up using ggdist for a bunch of other plots in the future we can move it to Imports so that it gets automatically installed. @avehtari @TeemuSailynoja what do you think?

jgabry avatar Jun 23 '25 16:06 jgabry

This solution makes sense to me. If @TeemuSailynoja or @avehtari don't have any objection, I can work on this.

behramulukir avatar Jun 23 '25 16:06 behramulukir

I like ggdist, so I'm fine using it more

avehtari avatar Jun 23 '25 16:06 avehtari

I think adding ggdist as a suggested package is very good for this.

TeemuSailynoja avatar Jun 23 '25 17:06 TeemuSailynoja

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Project coverage is 98.62%. Comparing base (da5c707) to head (cd9f63f). Report is 9 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master     #357      +/-   ##
==========================================
- Coverage   98.64%   98.62%   -0.03%     
==========================================
  Files          35       35              
  Lines        5550     5600      +50     
==========================================
+ Hits         5475     5523      +48     
- Misses         75       77       +2     

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

:rocket: New features to boost your workflow:
  • :snowflake: Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

codecov-commenter avatar Jun 26 '25 11:06 codecov-commenter

Thanks, I'll look into all of these and update the code

behramulukir avatar Jun 26 '25 19:06 behramulukir