ggplot2 geom_bar()/geom_col() erroneously warn that they ignore width aesthetic

geom_bar() and geom_col() let you specify a width aesthetic to control the width of the bars.

The behavior is as expected, but it generates an erroneous warning "Warning: Ignoring unknown aesthetics: width".

width isn't listed in the aesthetics section of ?geom_bar, so it appears that this is an unofficial behavior.

library(ggplot2)
suppressPackageStartupMessages(library(dplyr))
mtcars_by_cyl <- mtcars %>% 
  group_by(cyl) %>% 
  summarize(
    mean_wt = mean(wt),
    n = n()
  ) %>% 
  mutate(prop = n / sum(n))

ggplot(mtcars_by_cyl) + 
  geom_col(aes(cyl, mean_wt, width = prop))
#> Warning: Ignoring unknown aesthetics: width

Created on 2019-02-13 by the reprex package (v0.2.0).

Similar closed issues.

https://github.com/tidyverse/ggplot2/issues/1904
https://github.com/tidyverse/ggplot2/issues/2473

Feb 13 '19 21:02 richierocks

Currently, width is recognized as a parameter by a "hack". Here's the comment written 4 years ago. Maybe it's worth trying to make width to a proper aes?

https://github.com/tidyverse/ggplot2/blob/43dcd632fe96d412c13689454ffee366aaa39ce3/R/geom-bar.r#L130

Feb 14 '19 02:02 yutannihilation

elsewhere (e.g. boxplot), width is added to the list of extra_params:

https://github.com/tidyverse/ggplot2/blob/03bd9461fd0ae236d15be6d215a42911518b18ee/R/geom-boxplot.r#L162

Feb 17 '19 05:02 ptoche

width works just fine as a parameter in the way the code is currently written, and the "hack" is fine also. The question is whether width should be an aesthetic. I'm skeptical, because bars with varying widths are not normally meaningful. It's not that different a case from bars that start from a base value other than zero, which we also don't support. If people really want to do something like this, they can use geom_rect() or geom_tile() instead.

Feb 17 '19 05:02 clauswilke

The question is whether width should be an aesthetic

Isn't width already an aes? At least, the plot above seems to have varying widths of bars.

Feb 17 '19 06:02 yutannihilation

Sorry, I was confused. Now I come to think the varying widths of geom_col() is just a mistake. It uses data$width, but it should be really "ignored" as the warning says.

https://github.com/tidyverse/ggplot2/blob/43dcd632fe96d412c13689454ffee366aaa39ce3/R/geom-col.r#L40-L46

In geom_bar()'s case, stat_count() provides the width, so it should be used. But, geom_col() uses stat_identity(), which we should not expect width.

Feb 17 '19 12:02 yutannihilation

But, in terms of the interface (I don't mean the current behaviour is semantically correct), width is provided by a Stat via data. So, it is virtually an aes.

I'm wondering why width is not passed via param...

Feb 17 '19 12:02 yutannihilation

Oh, this last example reminds me of the need for varying width.

# You can specify a function for calculating binwidth,
# particularly useful when faceting along variables with
# different ranges

https://ggplot2.tidyverse.org/reference/geom_histogram.html

Feb 17 '19 13:02 yutannihilation

Here's my understanding. Is this correct?

We want to enforce a constant bar width within a panel, so width cannot be an aes.
Yet, the width can vary among panels, so we need to pass widths per bar via data, not a single value via param.
data$width should be used only when the Stat provides it. But, geom_col() is not the case, it should ignore data$width.

Feb 17 '19 13:02 yutannihilation

Just a comment in passing: If width were to be passed as an aes to capture the relative amounts of some variable in the dataset, the bar-chart would become a sort of rectangular-shaped pie-chart, where the area --- not the length --- becomes the relevant metric (I don't think that's "meaningless", but would suffer from most of the problems that pie-charts have). As far as I can tell, Hadley (among others) is not fond of pie-charts.

For the standard bar-chart with "meaningless" width, I would argue that the current default width of geom_bar is too wide : narrower bars would help the eye focus on the important metric --- height. Excel and LibreOffice/Calc seem to go for a default of 100%, i.e. the space between bars = width of the bars. geom_bar is wider than that. Anyone else thinks it ought to be narrower?

library("reprex")

library("ggplot2")
ggplot(mtcars, aes(x = gear)) + geom_bar()


ggplot(mtcars, aes(x = gear)) + geom_bar(width = 0.5)


ggplot(mtcars, aes(x = gear)) + geom_bar(width = 0.25)

^{Created on 2019-02-18 by the reprex package (v0.2.1)}

Feb 18 '19 05:02 ptoche

Just to comment here that allowing width as an aesthetic can be used to have different "sized" pies in pie charts, which is quite useful (I mean, as useful as a pie chart can be...):

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library(ggplot2)

d <- mtcars %>% 
  group_by(am) %>% 
  count(cyl) %>% 
  mutate(total = sum(n),
         norm_n = n / total)

(p <- ggplot(d, aes(0, norm_n, fill = factor(cyl))) + 
    facet_grid(cols = vars(am)) + 
    geom_col(aes(width = total), position = position_stack()))
#> Warning: Ignoring unknown aesthetics: width

p + aes(x = total/2) + coord_polar("y")

^{Created on 2021-08-12 by the reprex package (v2.0.0)}

Aug 12 '21 06:08 mattansb

Variable width is useful for bar charts by month, to prevent the bars from overlapping. Especially if you want no gaps between the bars, but also because you'll get inconsistent gaps otherwise.

You can hack it by making the dates a factor instead, but then you need to do much more work to get a nice date axis.

library(dplyr, warn.conflicts = FALSE)
library(lubridate, warn.conflicts = FALSE)
library(ggplot2, warn.conflicts = FALSE)

set.seed(1)

df <- tibble(
  date = seq.Date(ymd("2020-01-01"), ymd("2020-12-01"), by = "1 month"),
  quantity = sample(20:100, 12),
  ndays = days_in_month(date)  # width for different months
) |> 
  mutate(date = date + ndays / 2)  # reposition to fix the overlaps

df
#> # A tibble: 12 × 3
#>    date       quantity ndays
#>    <date>        <int> <int>
#>  1 2020-01-16       87    31
#>  2 2020-02-15       58    29
#>  3 2020-03-16       20    31
#>  4 2020-04-16       53    30
#>  5 2020-05-16       62    31
#>  6 2020-06-16       33    30
#>  7 2020-07-16       78    31
#>  8 2020-08-16       70    31
#>  9 2020-09-16       40    30
#> 10 2020-10-16       73    31
#> 11 2020-11-16       26    30
#> 12 2020-12-16       56    31

df |> 
  ggplot() +
  geom_col(aes(date, quantity, width = ndays), alpha = 0.7)
#> Warning in geom_col(aes(date, quantity, width = ndays), alpha = 0.7): Ignoring
#> unknown aesthetics: width

^{Created on 2023-11-30 with reprex v2.0.2}

Nov 30 '23 15:11 olivermagnanimous

I understand that we do not want to encourage bar charts with variable widths, however I do think enforcing this is causing us more pain than gain. I'd like to challenge some points in favour of not recognising width as an aesthetic.

width works just fine as a parameter in the way the code is currently written,

Not really. It throws warnings about being ignored, while it is being used.

the "hack" is fine also

While the hack works to recognise the parameter, we wouldn't need the hack at all if it were a proper aesthetic.

If people really want to do something like this, they can use geom_rect() or geom_tile() instead.

geom_tile() is not a good alternative, for two reasons. The height aesthetic is not a position aesthetic, so it does not respond to scale transformations. Scale-transformed bar charts are probably a bad idea anyway, but I don't think we should prohibit it. Secondly, you have to use y = after_stat(count / 2) when pairing a bar chart with a stat, which is clunky.
geom_rect() is not a good alternative, also for two reasons. You have to specify ymin = 0, which is clunky. More importantly, when using a discrete x variable, the xmin and xmax are a pain to compute, because you'd have to manually convert the discrete variable into a continuous one.
If you want to solve most of these issues, you'd want a geom that has x/width parametrisation for the horizontal direction, but ymin/ymax parametrisation for the vertical direction. This geom does not exist.

data$width should be used only when the Stat provides it. But, geom_col() is not the case, it should ignore data$width.

Ideally, the geom shouldn't care whence the width data came. Baking in prohibitions for specific geom/stat pairings hurts the flexibility of the API and should, in my opion, only ever be used to enhance displays, not prohibit them.

I'd also like to re-iterate some points in favour of width as aesthetic.

We already allow bars with varying width directly from the aesthetics. Sure, we throw a warning in protest, but then promptly display the bars as people intended anyway. We can even circumvent this warning by using ggplot(..., mapping = aes(..., width = var)) as it'll end up in the layer data even for layers that don't have width as an aesthetic or parameter.
There are valid use-cases from a user perspective, as pointed out elsewhere in this issue.
There are valid use-cases from a developer perspective, such as when width comes from a position adjustment, stat computation, or needs to vary between panels.
Maintaining width as a proper aesthetic is easier than relying on the current hack.

In summary, the main argument against width as an aesthetic is that it might possibly encourage some bad visualisation. However, we can't stop people from doing this anyway and having ggplot2 jump through hoops to discourage this is causing discomfort in the shape of hacks and spurious warnings. Therefore, I argue we should just let width be an aesthetic.

Mar 26 '24 09:03 teunbrand

@teunbrand Let me go back on my argument from six years ago. While I still think one has to be careful with variable widths in a plot, I also these days believe plotting software should be maximally flexible and not impose specific design philosophies on their users. So unless there's a good technical reason not to have width as an aesthetic I don't see how we lose in any way by making it one.

Mar 26 '24 16:03 clauswilke

Thanks Claus, it seems we are in alignment then over this. I didn't mean to single out your arguments (and I'm sorry if it appeared that way). I just felt that this issue was stuck in a weird place of being acknowledged and having proposed solutions, but being dormant for a while. My arguing hopefully would get folks on board with the 'width as aesthetic' approach, so we can move forward on this issue.

Mar 26 '24 17:03 teunbrand

No worries, I didn't feel singled out. In fact, I was surprised by my own comment from 2019 as today I don't think I would write it. (I came here thinking: let me argue in favor of width as an aesthetic and let's see who the idiot was that argued against it. Well, it was me apparently. 🤣)

Mar 26 '24 17:03 clauswilke

I apreciate @teunbrand's take on the subject, but -- and I may well have missed something -- I am not sure the proposed fix works as intended. See comment at #5807.

A typical example of valid use of bar width is for multiplicative units, such as average price x number of units to get a transaction volume displayed as area. Such graphs are quite common in physics / engineering / climate science, etc.

May 23 '24 09:05 katossky