ggplot2 icon indicating copy to clipboard operation
ggplot2 copied to clipboard

Feature request: quantiles based on observations instead of density estimates with geom_violin

Open SarenT opened this issue 5 years ago • 5 comments

geom_violin uses density estimate to plot quantiles, which doesn't align with box plot.

This has already been asked and is document: https://stackoverflow.com/questions/36033341/differing-quantiles-boxplot-vs-violinplot issue: #2088

I think it would be useful to have an optional parameter to choose quantiles computed based on actual observations instead of density estimates. violin plots are useful to see rough distribution of the data but quantiles of actual observations also matter. Therefore, I think it would be great to have such an option without breaking existing behaviour.

PS: reprex didn't work for me but here is a simple reproducible code.

library(ggplot2)
set.seed(5)
type1 = rnorm(n = 20, mean = 5)
type2 = c(10, 9, 9, 9, 9, 7, 7, 6, 5, 4, 1)
type3 = c(rnorm(n = 10, mean = 2.5), rnorm(n = 10, mean = 7.5, sd = 0.5))

#median(example1)
#median(example2)

df = data.frame(type = c(rep("Type 1", length(type1)), rep("Type 2", length(type2)), rep("Type 3", length(type3))), 
				val = c(type1, type2, type3))

ggplot(df, aes(x = type, y = val)) + theme_classic() + 
	geom_boxplot(alpha = 0.5) + 
	geom_violin(scale = "area", alpha = 0.5, draw_quantiles = c(0.25, 0.5, 0.75)) +
	geom_dotplot(binaxis = "y", stackdir = "center", alpha = 0.3, dotsize = 0.4)

SarenT avatar Jul 06 '20 14:07 SarenT

We do not have the development bandwidth to work on such a feature at the moment but if you want to create a PR I will be happy to review it

thomasp85 avatar Aug 31 '20 11:08 thomasp85

Yes I can look into this and create a PR.

SarenT avatar Sep 07 '20 09:09 SarenT

I think explicitly pointing this out in the documentation would be helpful. The current line in the docs is

draw_quantiles | If not(NULL) (default), draw horizontal lines at the given quantiles of the density estimate.

Maybe just adding a parenthesis would make this more explicit:

draw_quantiles | If not(NULL) (default), draw horizontal lines at the given quantiles of the density estimate (these don't necessarily correspond to the quantiles of the actual data).

joelostblom avatar Oct 16 '21 20:10 joelostblom

If we ensure the quantiles are included at the stat level and have a computed variable column that keeps track of which values to display as quantiles, I could see how to make this work. However, I don't think the solution will be very elegant.

teunbrand avatar Mar 06 '23 17:03 teunbrand

Just a slightly adapted render of the reprex:

library(ggplot2)
set.seed(5)

types <- list(
  rnorm(n = 20, mean = 5),
  c(10, 9, 9, 9, 9, 7, 7, 6, 5, 4, 1),
  c(rnorm(n = 10, mean = 2.5), rnorm(n = 10, mean = 7.5, sd = 0.5))
)

df = data.frame(
  type = rep(paste0("Type ", seq_along(types)), lengths(types)),
  val  = unlist(types)
)

ggplot(df, aes(x = type, y = val)) + theme_classic() + 
  geom_boxplot(alpha = 0.5) + 
  geom_violin(scale = "area", alpha = 0.5, draw_quantiles = c(0.25, 0.5, 0.75), 
              colour = "red") +
  geom_dotplot(binaxis = "y", stackdir = "center", alpha = 0.3, dotsize = 0.4)
#> Bin width defaults to 1/30 of the range of the data. Pick better value with
#> `binwidth`.

Created on 2024-05-28 with reprex v2.1.0

teunbrand avatar May 28 '24 15:05 teunbrand