ggstats
ggstats copied to clipboard
Suggested enhancement: Allow for binning within `stat_prop`
I'd love the ability to use your stat_prop abilities to normalize proportions, but do so within a bin (calculated using stat_bin)
See example below of why / what I'd like to achieve. I believe ggstats::stat_prop is very close to what's needed, but I wonder if its possible to combine it with the abilities in ggplot2::stat_bin specifically reducing continuous variables by binning with a bin_width
library(lubridate)
#>
#> Attaching package: 'lubridate'
#> The following objects are masked from 'package:base':
#>
#> date, intersect, setdiff, union
library(ggplot2)
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
# set up data
laker_player_plays = lakers |>
tibble::as_tibble() |>
filter(team == 'LAL', stringr::str_length(player) > 0) |>
mutate(date = ymd(date))
# calculate breaks, for solutions that can't use stat_bin
breaks = seq(min(laker_player_plays$date), max(laker_player_plays$date), by = 31)
# Desired output, achievable through preprocessing:
# just pre-processing the data
laker_player_plays |>
mutate(date_group = cut(date, breaks = breaks)) |>
group_by(player, date_group) |>
count(name = 'plays') |>
group_by(date_group) |>
mutate(proportion_of_plays = plays/sum(plays)) |>
ggplot(aes(x = date_group,
y = proportion_of_plays,
color = player,
group = player)) +
geom_point() +
geom_line() +
scale_y_continuous(labels=scales::percent)
Desired Output

# closest you can get from pure ggplot2: abandoning the lines, use geom_histogram + position = stack
# advantage: binwidth processed during stat
ggplot(laker_player_plays) +
geom_histogram(aes(x = date, fill = player), position = 'fill', binwidth = 31)
Best possible with pure ggplot2

# Using ggstats::stat_prop to normalize the proportions, but pre-bin the x axis
# advantage: counts normalized during stat,
# disadvantage: binning must occur before
laker_player_plays |>
mutate(date_group = cut(date, breaks = breaks)) |>
ggplot() +
ggstats::stat_prop(aes(x = date_group,
by = date_group,
group = player,
color = player,
y = after_stat(prop)),
position = 'identity',
geom = 'line') +
scale_y_continuous(labels=scales::percent)
Great ability using ggstats, but lacking the ability to do binning

# Desired capability:
# laker_player_plays |>
# ggplot() +
# ggstats::stat_prop_by_bin(aes(x = date,
# group = player,
# color = player,
# y = after_stat(prop)),
# binwidth = 31,
# position = 'identity',
# geom = 'line') +
# scale_y_continuous(labels=scales::percent)
Created on 2025-05-22 with reprex v2.1.1
Dear @kieran-mace
it seems related to https://github.com/tidyverse/ggplot2/issues/6478
Would it be enough to achieve what you are looking for?