ggplot2 icon indicating copy to clipboard operation
ggplot2 copied to clipboard

Convert guides to ggproto

Open teunbrand opened this issue 2 years ago β€’ 14 comments

This is a WIP PR addressing #3329. Since the aim is a substantial overhaul of the guide system, it would be good to gather some feedback along the way. I was hoping we could align on some vision early on, which would make subsequent steps easier. For now, I made a first step by making a parent Guide class and GuideAxis to get the ball rolling. This step is small enough that it wouldn't matter if we decide on a totally different direction.

A few comments from my side that aren't in the code itself:

  • The general aim was to modularise code. I hope this can serve the purposes of countering code duplication and making it easier to extend.
  • The current GuideAxis is visually identical to the previous axis guide.
  • Also had to implement GuideNone to get this to work.
  • Two types of test are currently failing (locally):
    • Almost all visual tests. This is merely because the order of grobs in the axis has changed, which matters for the svg files, but not for the human eye. I can reverse-engineer the previous order if required.
    • One test for 'Using non-position guides for position scales results in an informative error', which should be resolved when guide_legend() is also converted.

The type of feedback I was hoping to get is (not limited to) the following:

  • General feedback about the design.
  • The naming of arguments, parameters, functions etc.
  • The 'statelessness' of ggproto objects. I know this should be pursued, but I couldn't immediately figure out how to transition the previously stateful S3 classes to a stateless ggproto class.
  • In the draw-method (previously the guide_gengrob() method), I just pass around the entire params list to every function that might use it. I don't know whether it is a bad thing, but if it might be useful if there was some elegant way to pass just the parameters it needs, without resorting to very verbose code.

teunbrand avatar Jun 18 '22 21:06 teunbrand

Hi @thomasp85, do you think you want to course-correct this WIP-PR at some point (whenever is convenient for you, I know these are busy times with the conference and all) or should I continue with the next guide? I was thinking about guide_legend next.

teunbrand avatar Jul 27 '22 18:07 teunbrand

I'll take a look next week as conference gets behind me. Thanks for reminding me

thomasp85 avatar Jul 27 '22 18:07 thomasp85

Thanks for taking a look, Thomas!

make the guide implementations themselves stateless

My intuition here would then be to tinker with the build_guides function to decompose the various states of the gdefs variable to explicitly pass around things like params, key and other parts that make the classes currently stateful. I'd have to figure out how this differs between position and non-position guides to see if we can reduce code duplication. Does that direction sound reasonable to you?

teunbrand avatar Aug 01 '22 11:08 teunbrand

Yes exactly. Having a clear expected output of a method makes it so much easier to build up intuition for what it is supposed to do. You may also think about having a Guides class that orchestrates it all and keeps all the stateful information but I'll leave that up to you

thomasp85 avatar Aug 02 '22 08:08 thomasp85

So I've updated the PR a bit. Main takeways:

  • There is now a GuidesList class wrapping the guides and their parameters. This works for position guides, should be expanded when also converting non-position guides. I didn't pick Guides as the name, to be consistent with the ScalesList class, but I'm not married to this idea.
  • The guides are no longer stateful themselves after construction, but GuidesList manages their parameters.
  • Labels are resolved better #4650
  • Secondary axis guides without explicit position arguments get opposite positions of primary guides https://github.com/tidyverse/ggplot2/issues/4650#issuecomment-1214136938

Open questions:

  • Should GuidesList take over the guides_train(), guides_geom() etc. functions?
  • I've read at various places that there are/were plans to convert the secondary axis system to be a guides only system. This is as good a time as any to make that switch, so is this something we'd like to pursue?
  • For #4907, how flexible should this be? In what I've written before, you could essentially make discontinuous axes with it, but I'm more than aware that this invites some abuse.
  • Is #4387 something we should consider for this PR?

teunbrand avatar Aug 13 '22 18:08 teunbrand

Great progress on this!

Yes, I think Guides (I think it is a better name πŸ™‚) should be the one orchestrating all the calls so guides_train() etc should become methods of this class. If it makes sense to convert the secondary guides at this point in time I think you should give it a go, but I wouldn't consider it a top priority. I also think that the various feature requests are something we can wait with as I perhaps see them as (at least #4907) as new guides rather than extensions to the existing axis guides.

The main focus, I think, is to think about how to merge this with legend guides and give that a stab

thomasp85 avatar Aug 24 '22 09:08 thomasp85

Thanks for taking a look Thomas! I'm happy to change GuidesList --> Guides. I'll work on incorporating the guides_train() methods etc as I convert legend guides next. For the axis guides the equivalent for these functions was very straightforward, but the legends have more bells and whistles to these methods, so it seemed more natural to incorporate the guides_*() functions along with the guides themselves.

teunbrand avatar Aug 24 '22 09:08 teunbrand

Sounds good - ping me whenever needing input

thomasp85 avatar Aug 24 '22 09:08 thomasp85

So this PR has been updated for a good chunk, here are the main takeaways (bat-signalling @thomasp85):

  • The Guides class now possesses methods previously carried out by the guides_*() family of functions. The guides_*() family of functions were deleted because they became redundant. Currently, these methods are bifurcated for the new ggproto Guide guides and the old S3 guide guides, but we can remove these branches after every guide is converted.
  • The guide_legend() has been converted to ggproto. There are no visual changes in the tests for legends. My goodness; the layout building was very complicated but I think I've ironed it out a bit.
  • While converting, I accidentally implemented #2728 because that made sense to me, but we could revert this if you're opposed. See plot below:
devtools::load_all("~/packages/ggplot2/")
#> β„Ή Loading ggplot2
ggplot(mpg, aes(displ, cty)) +
  geom_point(aes(colour = factor(cyl))) +
  guides(colour = guide_legend(
    title.theme = element_text(colour = "blue"),
    label.theme = element_text(colour = "red")
  )) +
  theme(legend.title = element_text(face = "bold.italic"),
        legend.text = element_text(angle = 45))

Created on 2022-09-04 by the reprex package (v2.0.1)

Some remaining things:

  • I think a virtual parent Guide class for all non-position guides might make sense, because there is quite a bit of code to share. Does that seem sensible? Maybe GuidePlot or GuideNonPosition?

Less pressing questions (not tagging related issues to avoid polluting other stuff to much):

  • I think it would make sense to make a title_snap argument that can be "legend", "labels" or "keys". Currently titles are aligned to the whole legend, but this can sometimes be a pain (issue 2465).
  • We could think about bestowing guide_axis()'s angle heuristics to others as well (issue 4594).
  • I would like to implement an argument that bypasses the spacing issues caused by the byrow argument (issue 4352). I think I've wrestled with the layout enough to now have a reasonable understanding of if. I think force_spacing would be a decent argument name.
  • I'd welcome your opinion on issue 4977, now is a good time to implement this.
  • Now would also be a good time to add an argument for issue 3669, though I'm not exactly sure what the correct approach would be (but I'm sure I could find a solution).

If you're nodding your head at these latest changes and have no somewhat pressing concerns at this point in time, I think the next thing to do is to convert guide_colourbar().

teunbrand avatar Sep 04 '22 12:09 teunbrand

So I've begun working on other guides, and came across the following where I think a visual change is desirable. For some reason, the guide becomes offset in a weird way when the labels in guide_bins() are longer than you might expect.

In the plot below, notice how the guide's axis roughly coinsides with y = 6.

library(ggplot2)
packageVersion("ggplot2") # current dev main branch
#> [1] '3.3.6.9000'

df <- data.frame(x = c(1, 2, 3),
                 y = c(6, 5, 7))

p <- ggplot(df, aes(x, y, size = x)) +
  geom_point() +
  guides(size = guide_bins(direction = "horizontal"))

# Short labels
p + scale_size_binned()

Now if we make the labels longer, it starts coinciding with y = 6.2-ish.

# Long labels
p + scale_size_binned(labels = c("long", "longer", "longest label"))

Created on 2022-09-12 by the reprex package (v2.0.1)

If you're opposed to this visual change, please let me know!

teunbrand avatar Sep 12 '22 20:09 teunbrand

That is obvious a bug and should be fixed. Haven't seen that beforeπŸ™‚

thomasp85 avatar Sep 13 '22 06:09 thomasp85

This PR has progressed a bit, so here are a few updates:

  • All guides are now converted.
  • At the moment, all the non-position guides inherit from GuideLegend. This is mostly because the layout code was complicated but flexible if we treat e.g. a colourbar as a 1-key legend.
  • I've taken some liberties with the styling items of guides that are not strictly part of the theme: you can now do for example guide_colourbar(frame = element_rect(...)) to style the frame. I tried to do this so that it doesn't break existing code.
  • For non-position guides that use axes, they now have a ticks.length argument.
  • Some visual snapshots needed an update, but I've checked them all to make sure that there are no visual changes (notable exception is guide_bins as described above).

Here is what I think still needs to be done:

  • Some polishing. E.g. I think that the text justification settings in legends are too complicated. Any other suggestions are appreciated!
  • Make it faster. I have the feeling that the system in this PR is a bit slower than the S3 system. I've tried profiling the code with {profvis}, but it is too fast to reliably identify bottlenecks. @thomasp85 do you know of any tips or tricks for profiling/writing faster code that I could apply here?
  • Write NEWS.md entry

Update: Speed difference when benchmarking old build_guides() and new build_guides() is negligible when using 4 guides (2 legends, 1 colourbar, 1 bins), so maybe it's the axis guides that are slower?

library(ggplot2)

ggplot(iris, aes(Species, Sepal.Width)) +
  geom_point(
    aes(size = Petal.Width, colour = Species, fill = Sepal.Length,
        shape = sample(c("A", "B"), nrow(iris), replace = TRUE))
  ) +
  scale_shape_manual(values = c(21, 24)) +
  scale_size_binned()

build <- ggplot_build(last_plot())
plot  <- build$plot

build_guides <- ggplot2:::build_guides
plot_theme   <- ggplot2:::plot_theme


warmup <- build_guides(plot$scales, plot$layers, plot$mapping, "right", plot_theme(plot), 
                       plot$guides, plot$labels)

bm1 <- bench::mark(
  "current" =  build_guides(plot$scales, plot$layers, plot$mapping, "right", plot_theme(plot), 
                            plot$guides, plot$labels),
  min_iterations = 100
)

rm(build_guides, plot_theme)

devtools::load_all("~/packages/ggplot2/")
#> β„Ή Loading ggplot2

ggplot(iris, aes(Species, Sepal.Width)) +
  geom_point(
    aes(size = Petal.Width, colour = Species, fill = Sepal.Length,
        shape = sample(c("A", "B"), nrow(iris), replace = TRUE))
  ) +
  scale_shape_manual(values = c(21, 24)) +
  scale_size_binned()

build <- ggplot_build(last_plot())
plot  <- build$plot

warmup <- build_guides(plot$scales, plot$layers, plot$mapping, "right", plot_theme(plot), 
                       plot$guides, plot$labels)

bm2 <- bench::mark(
  "ggproto_guides" =  build_guides(plot$scales, plot$layers, plot$mapping, "right", plot_theme(plot), 
                                   plot$guides, plot$labels), 
  min_iterations = 100
)

rbind(bm1, bm2)
#> # A tibble: 2 Γ— 6
#>   expression          min   median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr>     <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
#> 1 current          66.2ms   69.3ms      14.3     354KB    36.9 
#> 2 ggproto_guides   67.1ms   69.8ms      14.3     367KB     9.16

teunbrand avatar Sep 14 '22 11:09 teunbrand

It is great to see all the progress you are making... I'm in the midst of the next ggplot2 release so if it is ok with you I'll wait a bit for giving it a proper review.

As for benchmarking it, that can of course be problematic. Can you extract the guide pipeline and only benchmark this, running it repeatedly using bench? I think you'd only need to benchmark it up to the point where grabs have been created - avoid any actual plotting...

If you see that there is significant slowdown then profiling is another issue altogether that we can take if necessary

thomasp85 avatar Sep 14 '22 12:09 thomasp85

if it is ok with you I'll wait a bit for giving it a proper review.

Yep, that is totally fine; I understand.

Thanks for the tips on benchmarking, I'll try to see where it might be a bit slower!

teunbrand avatar Sep 14 '22 13:09 teunbrand

I've tried to speed up code where I can. Locally, the tests on the main branch run in 71.6 seconds, whereas on this PR they take 75.5 seconds (down from ~85 seconds previously). I've attached a benchmark to this message, and it seems the axis drawing bits are about as fast as the previous guide_genbrob.axis() method, but setting up the guides takes 20% longer. I'm struggling to find any parts that could go faster at the moment.


library(ggplot2)

df <- data.frame(
  x = 1:400
)

p <- ggplot(df, aes(x, x)) +
  geom_point() +
  facet_wrap(~ x, scales = "free")

b <- ggplot_build(p)

plot   <- b$plot
layout <- b$layout
data   <- b$data
theme  <- ggplot2:::plot_theme(plot)

bm_setup_1 <- bench::mark(
  main = layout$setup_panel_guides(plot$guides, plot$layers, plot$mapping),
  min_iterations = 10
)
#> Warning: Some expressions had a GC in every iteration; so filtering is disabled.
bm_render_1 <- bench::mark(
  main = ggplot2:::render_axes(
    layout$panel_params, layout$panel_params,
    layout$coord, theme, transpose = TRUE
  ),
  min_iterations = 10
)
#> Warning: Some expressions had a GC in every iteration; so filtering is disabled.

devtools::load_all("~/packages/ggplot2")
#> β„Ή Loading ggplot2

df <- data.frame(
  x = 1:400
)

p <- ggplot(df, aes(x, x)) +
  geom_point() +
  facet_wrap(~ x, scales = "free")

b <- ggplot_build(p)

plot   <- b$plot
layout <- b$layout
data   <- b$data
theme  <- ggplot2:::plot_theme(plot)

bm_setup_2 <- bench::mark(
  pr = layout$setup_panel_guides(plot$guides, plot$layers, plot$mapping),
  min_iterations = 10
)
#> Warning: Some expressions had a GC in every iteration; so filtering is disabled.
bm_render_2 <- bench::mark(
  pr = ggplot2:::render_axes(
    layout$panel_params, layout$panel_params,
    layout$coord, theme, transpose = TRUE
  ),
  min_iterations = 10
)
#> Warning: Some expressions had a GC in every iteration; so filtering is disabled.

rbind(bm_setup_1, bm_setup_2)
#> # A tibble: 2 Γ— 6
#>   expression      min   median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr> <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
#> 1 main          660ms    673ms      1.49    8.41MB    11.6 
#> 2 pr            800ms    818ms      1.23    8.32MB     4.17
rbind(bm_render_1, bm_render_2)
#> # A tibble: 2 Γ— 6
#>   expression      min   median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr> <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
#> 1 main          1.56s    1.68s     0.590    3.06MB     8.79
#> 2 pr            1.57s    1.69s     0.593    3.31MB     4.21

Created on 2022-11-13 by the reprex package (v2.0.1)

teunbrand avatar Nov 13 '22 12:11 teunbrand

Alright I think this ready for review now. A few ongoing thoughts:

  • I've tried to stick close to the pre-existing guide system and made a few improvements here and there where I think they wouldn't break any existing code. Therefore, any improvements that might cause user code to produce different results, or outright features, might be more suitable for further follow-up PRs so they can be considered separately.
  • The <Guides> class is instantiated via internal guides_list() function. We might want to consider the user-facing guides() function to return a <Guides> class directly.
  • I think the <GuideLegend> implementation could be a bit cleaner, if we were to change priority of text justification.
    • Currently, the priority is label.hjust > theme$legend.text.align > label.theme$hjust > default_justification (somewhat similar for title).
    • I'd propose the order become label.hjust > label.theme$hjust > theme$legend.text.align > default_justification. This allows us to capture many label/title parameters during construction in element_text(), which reduces the amount of parameters that need to be passed around.

teunbrand avatar Nov 14 '22 20:11 teunbrand

Hi @thomasp85,

As discussed, I've tried to make a few extensions to the ggproto guide system that you can find here: https://github.com/teunbrand/gguidance. If there is any functionality in there that we might consider having in vanilla ggplot2, I'd be happy to do a follow-up PR.

I've learned a few things by doing so:

  • It might be useful for testing purposes to export guides_list(). Alternatively, we could have guides() return a <Guides> object.
  • Primarily in <GuideLegend>, it might be good to double-check whether some things are present. For example, in case you don't need labels (because you're extending in a different way), it will still fuss over the justification and such.
  • It would be nice to be able to re-use the constructors for some guides. There is now an ugly hack to do this in guide_colourbar(), but I think it should be possible to re-use any constructor in a more elegant way. The is currently the hack: https://github.com/teunbrand/ggplot2/blob/9fd9f31fcfc6a93b2df58ae06a456c9d5d1b4c98/R/guide-colorbar.r#L208-L211

teunbrand avatar Jan 02 '23 18:01 teunbrand

Also stumbled on this bug:

library(ggplot2)

calc_element(
  "legend.title.align",
  theme_gray() + theme(legend.title.align = 0.5)
)
#> Error:
#> ! Theme element `legend.title.align` must have class <character>

calc_element(
  "legend.position",
  theme_gray() + theme(legend.position = c(0.5, 0.5))
)
#> Error:
#> ! Theme element `legend.position` must have class <character>

Created on 2023-01-02 by the reprex package (v2.0.1)

Whereas in the ?el_def documentation class is described as:

The name of the element class. Examples are "element_line" or "element_text" or "unit", or one of the two reserved keywords "character" or "margin". The reserved keyword "character" implies a character or numeric vector, not a class called "character".

I'll fix it here because this PR needs it, but if we're to reject this PR for some reason, this is a bug to open a separate issue about.

teunbrand avatar Jan 02 '23 22:01 teunbrand

Thanks for the review Thomas! These comments help a lot, as I might not see the forest for the trees anymore. I'll keep track of a TODO list based on your comments

  • [x] Change Map('[[<-', ...) pattern
  • [x] Rename variable panel_guides_grob()
  • [x] Use return(NULL) instead of if (is.null(x)) { return(x) }.
  • [x] Rename geom method to key_geom()/get_key_geom() (leaning towards get_layer_key())
  • [x] Update type checks

teunbrand avatar Mar 22 '23 20:03 teunbrand

There is also some lingering comments here https://github.com/tidyverse/ggplot2/pull/4879#pullrequestreview-1056998542

thomasp85 avatar Mar 24 '23 09:03 thomasp85

This should now be all up-to-date again with the suggested changes.

teunbrand avatar Mar 26 '23 14:03 teunbrand

Do we have any unit tests that ensure that plotting a ggplot2 object doesn't incur any state change on the object itself... I'm sure I've asked this before and forgotten

thomasp85 avatar Apr 11 '23 10:04 thomasp85

I added one. Was a bit puzzled how we could test this, but seems to work when using ~~hashes~~ serialisation of the objects.

teunbrand avatar Apr 11 '23 16:04 teunbrand

I was perhaps too optimistic. My test fails because I get the following weird mismatch between outside testthat and inside testthat (backtrace omitted):

suppressPackageStartupMessages({
  library(ggplot2)
  library(testthat)
  library(rlang)
})

old <- GeomPoint$draw_panel
hash_old <- hash(old)
new <- GeomPoint$draw_panel
hash_new <- hash(new)

# This is fine, we've hashed the exact same thing
hash_old == hash_new
#> [1] TRUE

test_that("draw_panel stays the same", {
  old <- GeomPoint$draw_panel
  hash_old <- hash(old)
  new <- GeomPoint$draw_panel
  hash_new <- hash(new)
  # This should be fine because we've hashed the exact same thing, but isn't???
  expect_equal(hash_old, hash_new)
})
#> ── Failure ('<text>:20'): draw_panel stays the same ────────────────────────────
#> `hash_old` not equal to `hash_new`.
#> 1/1 mismatches
#> x[1]: "ab35431580bc82d1e7bd57797ec36d86"
#> y[1]: "1d0e16c338b2f60d4e47c561f57b6609"
#> Error in `reporter$stop_if_needed()`:
#> ! Test failed

Created on 2023-04-11 with reprex v2.0.2

@thomasp85 if you have some wisdoms on how to test whether a ggproto object has mutated that'll be greatly appreciated πŸ˜…

EDIT: should be fixed by making an immutable copy by serialisation

teunbrand avatar Apr 11 '23 18:04 teunbrand

Maybe instead of testing we should make sure we have a very clear understanding of which values in the guides and guidelist are susceptible to change during a rendering pass, and then make sure that changes in these values does not leak into a new rendering pass...

thomasp85 avatar Apr 14 '23 08:04 thomasp85

This is the sort of situation I am eager to avoid https://github.com/tidyverse/ggplot2/pull/4475

thomasp85 avatar Apr 14 '23 08:04 thomasp85

I've reordered the Guides class a bit so that many methods follow the order in which they are called. I also added some clarifying comments about what the mutable fields are. Because the diff of the last commit is difficult to read, I've pointed towards the relevant bits in the comments above.

teunbrand avatar Apr 16 '23 20:04 teunbrand

Some magic github incantations: Fix #4879, Fix #4364, Fix #4930, Fix #2728, Fix #4650, Fix #4958, Fix #5152

teunbrand avatar Apr 21 '23 16:04 teunbrand

Good call to wait after the weekend, I just caught a bug that I should've caught before :)

teunbrand avatar Apr 23 '23 20:04 teunbrand

I don't think GitHub picks up fix keywords in comments so you may want to add them to the first message instead

thomasp85 avatar Apr 24 '23 08:04 thomasp85