DataExplorer icon indicating copy to clipboard operation
DataExplorer copied to clipboard

Add 'by' argument to plot_histogram and plot_density

Open seb-mueller opened this issue 1 year ago • 3 comments

Similar to #139, it would be great to have the by argument for those functions. This would be in particular useful for multimodal data for which boxplots are limited.

seb-mueller avatar Apr 03 '24 08:04 seb-mueller

Hi @seb-mueller, thanks for your suggestion, but I am a little confused with the request. Histogram is basically a univariate frequency counter, and I don't see how that changes for multimodal data. With that, I don't see how a by argument can apply here. Would you mind creating an example to further illustrate your idea? Thanks!

boxuancui avatar Apr 03 '24 19:04 boxuancui

Thanks for getting back, maybe let's use the inbuild mpg dataset as example. Say I want to compare the distribution for all covariates but for each fl. Naively, I'd do something like plot_density(mpg, by = "fl") I've copied some native ggplot code to make the case for just the cty covariate, but I'd like to get the same for all covariates (colored or split up by "fl"):

library(ggplot2)
library(DataExplorer)
plot_density(mpg, ncol = 2L)


ggplot(mpg, aes(x = cty, fill = fl)) +
geom_density()
#> Warning: Groups with fewer than two data points have been dropped.
#> Warning in max(ids, na.rm = TRUE): no non-missing arguments to max; returning
#> -Inf


ggplot(mpg, aes(x = cty, color = fl)) +
geom_density()
#> Warning: Groups with fewer than two data points have been dropped.
#> no non-missing arguments to max; returning -Inf

Created on 2024-04-03 with reprex v2.1.0

seb-mueller avatar Apr 03 '24 21:04 seb-mueller

Thanks for the detailed explanation. That makes a lot of sense now.

boxuancui avatar Apr 03 '24 21:04 boxuancui