ggstats icon indicating copy to clipboard operation
ggstats copied to clipboard

Suggested enhancement: Allow for binning within `stat_prop`

Open kieran-mace opened this issue 6 months ago • 1 comments

I'd love the ability to use your stat_prop abilities to normalize proportions, but do so within a bin (calculated using stat_bin)

See example below of why / what I'd like to achieve. I believe ggstats::stat_prop is very close to what's needed, but I wonder if its possible to combine it with the abilities in ggplot2::stat_bin specifically reducing continuous variables by binning with a bin_width

library(lubridate)
#> 
#> Attaching package: 'lubridate'
#> The following objects are masked from 'package:base':
#> 
#>     date, intersect, setdiff, union
library(ggplot2) 
library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union

# set up data
laker_player_plays = lakers |> 
  tibble::as_tibble() |> 
  filter(team == 'LAL', stringr::str_length(player) > 0) |> 
  mutate(date = ymd(date))

# calculate breaks, for solutions that can't use stat_bin
breaks = seq(min(laker_player_plays$date), max(laker_player_plays$date), by = 31)

# Desired output, achievable through preprocessing:
# just pre-processing the data
laker_player_plays |> 
  mutate(date_group = cut(date, breaks = breaks)) |>
  group_by(player, date_group) |> 
  count(name = 'plays') |> 
  group_by(date_group) |> 
  mutate(proportion_of_plays = plays/sum(plays)) |> 
  ggplot(aes(x = date_group, 
             y = proportion_of_plays,
             color = player,
             group = player)) +
  geom_point() +
  geom_line() +
  scale_y_continuous(labels=scales::percent)

Desired Output


# closest you can get from pure ggplot2: abandoning the lines, use geom_histogram + position = stack
# advantage: binwidth processed during stat
ggplot(laker_player_plays) +
  geom_histogram(aes(x = date, fill = player), position = 'fill', binwidth = 31)

Best possible with pure ggplot2



# Using ggstats::stat_prop to normalize the proportions, but pre-bin the x axis
# advantage: counts normalized during stat,
# disadvantage: binning must occur before
laker_player_plays |> 
  mutate(date_group = cut(date, breaks = breaks)) |>
ggplot() +
  ggstats::stat_prop(aes(x = date_group, 
                         by = date_group, 
                         group = player,
                         color = player, 
                         y = after_stat(prop)),
                     position = 'identity',
                     geom = 'line') +
  scale_y_continuous(labels=scales::percent)

Great ability using ggstats, but lacking the ability to do binning



# Desired capability:
# laker_player_plays |>
#   ggplot() +
#   ggstats::stat_prop_by_bin(aes(x = date,
#                                 group = player,
#                                 color = player,
#                                 y = after_stat(prop)),
#                             binwidth = 31,
#                             position = 'identity',
#                             geom = 'line') +
#   scale_y_continuous(labels=scales::percent)

Created on 2025-05-22 with reprex v2.1.1

kieran-mace avatar May 22 '25 05:05 kieran-mace

Dear @kieran-mace

it seems related to https://github.com/tidyverse/ggplot2/issues/6478

Would it be enough to achieve what you are looking for?

larmarange avatar May 23 '25 08:05 larmarange