ggplot2 icon indicating copy to clipboard operation
ggplot2 copied to clipboard

POC bin breaks derived from scale breaks

Open arcresu opened this issue 1 year ago • 13 comments

This is a proof of concept of the minimal changes necessary to fix #6159. If you're willing to consider this approach I'll finish it off with documentation, tests, and the outstanding TODOs below.

The first part is essentially the same as the extension discussed in the issue: i.e. the follow.scale param on stat_bin causes it to inherit bins from the scale. As noted, that only works if the scale doesn't get new breaks during the final retraining, i.e. provide fixed breaks, or disable scale expansion and hope other layers don't cause issues. In this example the bins don't align with the final breaks because the scale expands after the binning, causing the breaks to move.

(TODO: add follow.scale to the other binning stats. Suppress the default binning warning when follow.scale = TRUE. Add a value like follow.scales = "minor" to allow inheriting major and minor breaks?)

devtools::load_all("~/code/ggplot2")
#> ℹ Loading ggplot2

set.seed(2024)
df <- data.frame(
  date = as.Date("2024-01-01") + rnorm(100, 0, 5),
  z = sample(c("a", "b"), 100, replace = TRUE)
)

ggplot(df, aes(date)) +
  geom_histogram(follow.scale = TRUE)
#> `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

The fix is to tell the scale that we want the breaks to be "frozen" before the stats are computed. Subsequent retraining is free to change the limits, which affects which breaks are shown, but once the breaks are frozen it acts as though they had been passed in as an explicit breaks vector.

(TODO: Add a param to the continuous scale constructor and scale_{x,y}_{continuous,date,datetime}. Maybe come up with a better name than freezing, like breaks_computation = c("auto", "before_stat"))


ggplot(df, aes(date)) +
  geom_histogram(follow.scale = TRUE) +
  ggproto(NULL, scale_x_date(), freeze_breaks = TRUE)
#> `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

It also looks reasonable when there are multiple facets:


ggplot(df, aes(date)) +
  geom_histogram(follow.scale = TRUE) +
  facet_wrap(vars(z)) +
  ggproto(NULL, scale_x_date(), freeze_breaks = TRUE)
#> `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Adding a distant data point and setting the scales to free, we can see that the binning is done independently for different facets:


rbind(df, data.frame(date = as.Date("2025-03-01") , z = "a")) |>
  ggplot(aes(date)) +
  geom_histogram(follow.scale = TRUE) +
  facet_wrap(vars(z), scales = "free_x") +
  ggproto(NULL, scale_x_date(), freeze_breaks = TRUE, guide = guide_axis(angle = 90)) 
#> `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Created on 2024-11-01 with reprex v2.1.1

This seems probably desirable behaviour since we did explicitly request free scales here. Changing it to make the binning consistent across panels would also be a bit complicated because I think the facets clone the scales before the first time breaks are computed.

To make the combination of settings more discoverable, it's probably reasonable to add a warning when using follow.scale with a scale that doesn't have freeze_breaks = TRUE.

Please let me know if I've overlooked some way that these changes will cause problems with other parts of ggplot!

arcresu avatar Nov 01 '24 06:11 arcresu