ggplot2 icon indicating copy to clipboard operation
ggplot2 copied to clipboard

Use label-attribute as default axis/legend title

Open crsh opened this issue 2 years ago • 4 comments

Currently, axis and legend titles default to the column names of the data.frame. There are several packages that provide functions to attach more comprehensive variable labels to data.frame columns (e.g., Hmsic, tinylabels, labelled, and sjlabelled). While these packages differe with respect to the specific implementation, all implementations are compatible in the sense that they attach the variable labels to the data.frame colums via the label attribute. I think it would be useful to use such labels, if available, as they can probably be expected to be more informative than the bare column names.

Consider the following example. First, I'll create some labelled data (I think tinylabels provides the safest implementation, but any of the below will work).

library("dplyr")
library("ggplot2")

# Set up data & labels using {tinylabels}
library("tinylabels")
mtcars2 <- mtcars |>
  mutate(gear = factor(gear)) |>
  label_variables(
    wt = "Weight (1000 lbs)",
    gear = "Gears",
    vs = "Engine"
  )

# # Set up data & labels using {Hmisc}
# library("Hmisc")
# mtcars2 <- mtcars |>
#   mutate(gear = factor(gear)) |>
#   within({
#     label(gear) <- "Gears"
#     label(wt) <- "Weight (1000 lbs)"
#     label(vs) <- "Engine"
#   })
# 
# # Set up data & labels using {sjlabelled}
# library("sjlabelled")
# mtcars2 <- mtcars |>
#   mutate(gear = factor(gear)) |>
#   var_labels(
#     wt = "Weight (1000 lbs)",
#     gear = "Gears",
#     vs = "Engine"
#   )

For demonstration purposes I slightly change ggplot_add.labels to use the label attribute, if available:

ggplot_add.labels <- function (object, plot, object_name) {
  object <- add_variable_labels(object, plot) # Newly added code
  ggplot2::update_labels(plot, object)
}

add_variable_labels <- function(labels, plot) {
  vars <- sapply(plot$mapping, function(x) rlang::as_name(rlang::f_rhs(x)))
  
  if(length(vars) == 0) {
    return(labels)
  }
  
  variabel_labels <- lapply(plot$data, attr, "label")[vars]
  
  # Add variable labels
  to_add <- !names(vars) %in% names(labels)
  
  for(i in vars[to_add]) {
    if(!is.null(variabel_labels[[i]])) {
      labels[[names(vars[vars == i])]] <- variabel_labels[[i]]
    }
  }
  
  labels
}

assignInNamespace("ggplot_add.labels", ggplot_add.labels, "ggplot2")

With this change axis and legend labels will default to the labels attribute when labs() is called:

p <- ggplot(mtcars2, aes(x = wt, y = mpg, colour = gear)) +
  geom_point()

# Standard behavior (uses column names)
p 

# Uses labels, where available
p + labs()

# Overwrite defaults
p + labs( 
  y = "Fuel economy (mpg)"
  , color = "Bears"
)

I think this would be very nice feature and I'd be happy to take a stab at it.

If this is of interest, I see three options for a proper implementation:

  1. Always default to using labels (after some exploration, the required changes seem managable).
  2. Only default to using labels when labs() is called (pretty much what I have implemented above).
  3. Only default to using labels when labs() is called and labels are requested (e.g., use_labels = TRUE).

There are probably other sensible options that I'm not seeing. Either way, I'd be very interested to know if this would be a PR you would be willing to consider and in any thoughts you may have on this.

crsh avatar Sep 28 '21 20:09 crsh

Does anyone have thoughts on this? The PR could also add support for the class haven_labelled.

crsh avatar Dec 16 '21 15:12 crsh

I think this is likely to be a good idea. But it requires some thought to determine if it's ok to enable by default. It will affect existing plots, but generally in a positive way, and it generally won't affect publication ready plots because they should have already had their labels set manually. So I think a PR would be a great next step 😄

hadley avatar Apr 19 '22 12:04 hadley

This seems reasonable to implement when a mapping is a straightforward symbol, e.g. aes(x = mpg), but how should this work when the mapping is a computation like aes(x = log10(mpg + 1))? Should the label be extracted after aesthetics are evaluated?

teunbrand avatar Apr 17 '24 08:04 teunbrand

I think it would only apply to simple mappings.

hadley avatar Apr 17 '24 13:04 hadley