ggplot2
ggplot2 copied to clipboard
Feature request: quantiles based on observations instead of density estimates with geom_violin
geom_violin uses density estimate to plot quantiles, which doesn't align with box plot.
This has already been asked and is document: https://stackoverflow.com/questions/36033341/differing-quantiles-boxplot-vs-violinplot issue: #2088
I think it would be useful to have an optional parameter to choose quantiles computed based on actual observations instead of density estimates. violin plots are useful to see rough distribution of the data but quantiles of actual observations also matter. Therefore, I think it would be great to have such an option without breaking existing behaviour.
PS: reprex didn't work for me but here is a simple reproducible code.
library(ggplot2)
set.seed(5)
type1 = rnorm(n = 20, mean = 5)
type2 = c(10, 9, 9, 9, 9, 7, 7, 6, 5, 4, 1)
type3 = c(rnorm(n = 10, mean = 2.5), rnorm(n = 10, mean = 7.5, sd = 0.5))
#median(example1)
#median(example2)
df = data.frame(type = c(rep("Type 1", length(type1)), rep("Type 2", length(type2)), rep("Type 3", length(type3))),
val = c(type1, type2, type3))
ggplot(df, aes(x = type, y = val)) + theme_classic() +
geom_boxplot(alpha = 0.5) +
geom_violin(scale = "area", alpha = 0.5, draw_quantiles = c(0.25, 0.5, 0.75)) +
geom_dotplot(binaxis = "y", stackdir = "center", alpha = 0.3, dotsize = 0.4)
We do not have the development bandwidth to work on such a feature at the moment but if you want to create a PR I will be happy to review it
Yes I can look into this and create a PR.
I think explicitly pointing this out in the documentation would be helpful. The current line in the docs is
draw_quantiles | If not(NULL) (default), draw horizontal lines at the given quantiles of the density estimate.
Maybe just adding a parenthesis would make this more explicit:
draw_quantiles | If not(NULL) (default), draw horizontal lines at the given quantiles of the density estimate (these don't necessarily correspond to the quantiles of the actual data).
If we ensure the quantiles are included at the stat level and have a computed variable column that keeps track of which values to display as quantiles, I could see how to make this work. However, I don't think the solution will be very elegant.
Just a slightly adapted render of the reprex:
library(ggplot2)
set.seed(5)
types <- list(
rnorm(n = 20, mean = 5),
c(10, 9, 9, 9, 9, 7, 7, 6, 5, 4, 1),
c(rnorm(n = 10, mean = 2.5), rnorm(n = 10, mean = 7.5, sd = 0.5))
)
df = data.frame(
type = rep(paste0("Type ", seq_along(types)), lengths(types)),
val = unlist(types)
)
ggplot(df, aes(x = type, y = val)) + theme_classic() +
geom_boxplot(alpha = 0.5) +
geom_violin(scale = "area", alpha = 0.5, draw_quantiles = c(0.25, 0.5, 0.75),
colour = "red") +
geom_dotplot(binaxis = "y", stackdir = "center", alpha = 0.3, dotsize = 0.4)
#> Bin width defaults to 1/30 of the range of the data. Pick better value with
#> `binwidth`.

Created on 2024-05-28 with reprex v2.1.0