see icon indicating copy to clipboard operation
see copied to clipboard

Convenient geoms

Open DominiqueMakowski opened this issue 3 years ago • 14 comments

One way of robustifying our plots and making see useful beyond being a plotting companion to the other packages would be to add some useful geoms. We have some early tests in geom_violinhalf / geom_violinpoint / geom_poolpoint, but these were done back in the days when these functionalities were not easily available and when we (I) didn't have much ggplot programming experience (it's still not my area). As a result, the current geoms are not really as flexible and robust as they should be.

Here are some examples of useful geoms:

  • [ ] Updating geom_violinhalf (or simply removing) using ggdist's geom_halfeye
  • [ ] Updating geom_violinpoint based on the above
  • [ ] Adding some sort of geom_numericdescribe or something like that that would be a combination of the raw jittered points on one side, their distribution as a half violin (or dots-density) on the other other (so far resembling a classic raincloud plot) and why not some sleek shading for the quantiles behind the points. That would be super super useful to have a one geom solution to get an elegant summary of points, that we could add as a background for pointranges related to estimated means/CI
  • [ ] geom_lightbeam as a helper for robust lighthouse plots
  • [ ] ...

DominiqueMakowski avatar May 31 '21 14:05 DominiqueMakowski

We can also consider an alias geom_raincloud. I Can take a look at the existing geoms

bwiernik avatar May 31 '21 22:05 bwiernik

The big issue with ggdist (which I love dearly) is its tidyverse dependencies. We could consider helping to reduce those if @mjskay would be interested in that sort of contribution.

bwiernik avatar May 31 '21 22:05 bwiernik

The big issue with ggdist (which I love dearly) is its tidyverse dependencies. We could consider helping to reduce those if @mjskay would be interested in that sort of contribution.

I'm happy to take contributions to ggdist that reduce deps. I looked at the dependencies {ggdist} has in the current development version (and what those depend on, etc) and the dependencies (and what those depend on, etc) {see} has in the current CRAN version and the difference:

> ggdist_deps = pak::pkg_deps(".")   # executed on ggdist dev version
> see_deps = pak::pkg_deps("see")
> setdiff(ggdist_deps$package, see_deps$package)
 [1] "ggdist"         "HDInterval"     "distributional" "dplyr"          "forcats"        "generics"      
 [7] "numDeriv"       "purrr"          "tidyr"          "tidyselect" 

It turns out {see} already depends on a bunch of tidyverse stuff (like vctrs, tibble, rlang, glue, etc) directly or indirectly, so there aren't that many extra deps from ggdist's dependency tree (coincidentally, both packages have exactly 31 direct or indirect dependencies). A few of the extra deps ggdist has are not easily worked around:

  • "HDInterval" is for calculating highest-density intervals. In principle this is basically one function and could be duplicated into ggdist, but I don't much see the upside of maintaining it myself.
  • "distributional" is needed for the stat_dist_... family and can't be removed. It also brings in "generics" and "numDeriv" (those aren't direct dependencies of ggdist).
  • "dplyr" is needed because several functions in ggdist support grouped tibbles, notably the point_interval() family. This also brings in "tidyselect".

That leaves "forcats", "purrr", and "tidyr", all of which I suspect could be removed with varying levels of effort if someone wanted to take a stab at it. If there's interest I'd say open an issue on the ggdist repo and I'm happy to chat :).

As an aside, if there's interest in adopting {ggdist} in {see} in some capacity or other, I'd also be happy to chat about if there are missing distributional visualization types that could be helpful. {ggdist} is intended to be quite flexible and general with respect to distribution visualization so if there's something you can't do I'd like to know about it :).

mjskay avatar Jun 01 '21 02:06 mjskay

@mattansb It would be really cool to support analytic uncertainty distribution visualizations with bayestestR.

bwiernik avatar Jun 01 '21 13:06 bwiernik

@bwiernik Like some advance version of stat_summary(fun = mean_cl_normal)?

mattansb avatar Jun 01 '21 13:06 mattansb

Ala https://mjskay.github.io/ggdist/articles/freq-uncertainty-vis.html --so add methods for the various posterior visualizations in bayestestR for frequentist/MLE models using analytic distributions

bwiernik avatar Jun 01 '21 14:06 bwiernik

Are we talking about making a geom/stat? Or adding these options to the plotting methods?

mattansb avatar Jun 01 '21 14:06 mattansb

ggdist already has done much of that work, for example, I use stat_dist_slabinterval() often in my work. I'm thinking adding something like plot.see_dist_ci() that would produce a confidence distribution visualization similar to plot.see_ci() using, for example: ggdist::stat_dist_slabinterval().

bwiernik avatar Jun 01 '21 16:06 bwiernik

Agree! (Don't know why you tagged me, but I agree 😉) - what say you @DominiqueMakowski ?

mattansb avatar Jun 10 '21 06:06 mattansb

I agree

DominiqueMakowski avatar Jun 10 '21 07:06 DominiqueMakowski

Don't know why you tagged me

Because you are amazing, brah https://github.com/easystats/modelbased/issues/119#issuecomment-856466664

IndrajeetPatil avatar Jun 10 '21 07:06 IndrajeetPatil

Updating geom_violinhalf (or simply removing) using ggdist's geom_halfeye

Why not use ggridges::geom_density_ridges?

We already rely on ggridges, so we don't even need to gain an additional dependency.

library(ggridges)
library(ggplot2)

ggplot(iris, aes(x = Sepal.Length, y = Species)) +
  geom_density_ridges(
    # points
    jittered_points = TRUE,
    position = position_raincloud(
      adjust_vlines = TRUE,
      width = 0.02,
      height = 0.2
    ),
    point_size = 2,
    point_alpha = 0.5,
    quantile_lines = TRUE,
    # density
    scale = 0.7,
    alpha = 0.5,
    # quantile lines
    vline_size = 1,
    vline_color = "red"
  ) +
  coord_flip()
#> Picking joint bandwidth of 0.181

Created on 2021-06-10 by the reprex package (v2.0.0)

IndrajeetPatil avatar Jun 10 '21 08:06 IndrajeetPatil

We could switch the dependency to ggdist. It would give us a lot more flexibility, especially for analytic distributions

bwiernik avatar Jun 10 '21 08:06 bwiernik

I think it's worth the change

DominiqueMakowski avatar Jun 10 '21 09:06 DominiqueMakowski