ggplot2
ggplot2 copied to clipboard
Datapoints disapearing when `lims()` conflicts with `position_dodge()`
Hi,
It is a bit unexpected that lims()
consider the actual position of each point (after considering the dodge) and not its original position.
This makes some points disappear while the real value is within the limits.
For instance, consider:
library(tidyverse)
dat=tibble(x=rep(0:4, 2), y=rep(0:4, 2), gp=rep(c("A","B"), each=5))
p = ggplot(dat, aes(x,y, color=gp)) + geom_point(position=position_dodge(0.5))
p
p + xlim(0,NA)
#> Warning: Removed 1 rows containing missing values (`geom_point()`).
Created on 2023-01-12 with reprex v2.0.2
I cannot see how this can be a wanted behavior, as dodging is only about visualization and not about showing actual data. However, if it is, maybe this would be worth documenting, for instance in ?position_dodge
and/or in ?lims
.
I think the ?lims
documentation is pretty clear that out-of-bounds data gets censored to NA
. Is your suggestion to add to these docs that this also applies to position adjustments?
Yes, this is my last suggestion, but I think it would make a lot more sense to not consider position adjustments when censoring out of bounds.
Do you consider the censoring in my second example plot to be an expected and wanted behavior?
IMHO, lims()
is about selecting a range of datapoints, not a range of exact coordinates.
At the point that the data is being censored there is no information anymore what the original position was, so keeping track of that is not straightforward (but not a reason to not do it). Mentioning position adjustments in the docs is a lot easier in comparison.
I would expect the point to be censored because that is what ?lims
tells me to expect. If I would want to keep out-of-bounds observations I'd use the coord_cartesian(xlim = ...)
argument, or explicitly state scale_x_continuous(limits = ..., oob = scales::oob_keep)
.
Whether it is wanted ties in with intent, and that can vary from one plot to the next. In the example, there is a continuous x-axis whereas dodging is typically done on discrete x-axes. On discrete axes, definitely the observation should be kept. On continuous axes, it is more of a mixed bag to understand what the intent is. I think defaulting to a literal interpretation of the instructions is the right way to go.
Indeed, coord_cartesian(xlim = ...)
does the trick. I think this would be worth mentioning in the doc, but rather in ?position_dodge()
. For what it's worth, this is where I would look for this information.
I think keeping track of the original position would make a lot of sense. As a non-expert user, I'd expect ?position_dodge()
to change where the point appears, but not to actually change its coordinates.
But, as you say, this would probably require a lot of work that might not be worth it.
Note that the original coordinates might be somewhere to be found, as running plotly::ggplotly()
allows us to show them on hover.