ggplot2 Datapoints disapearing when `lims()` conflicts with `position

Datapoints disapearing when `lims()` conflicts with `position_dodge()`

Open DanChaltiel opened this issue 1 year ago • 4 comments

Hi,

It is a bit unexpected that lims() consider the actual position of each point (after considering the dodge) and not its original position.

This makes some points disappear while the real value is within the limits.

For instance, consider:

library(tidyverse)


dat=tibble(x=rep(0:4, 2), y=rep(0:4, 2), gp=rep(c("A","B"), each=5))
p = ggplot(dat, aes(x,y, color=gp)) + geom_point(position=position_dodge(0.5))
p

p + xlim(0,NA)
#> Warning: Removed 1 rows containing missing values (`geom_point()`).

^{Created on 2023-01-12 with reprex v2.0.2}

I cannot see how this can be a wanted behavior, as dodging is only about visualization and not about showing actual data. However, if it is, maybe this would be worth documenting, for instance in ?position_dodge and/or in ?lims.

Jan 12 '23 16:01 DanChaltiel

I think the ?lims documentation is pretty clear that out-of-bounds data gets censored to NA. Is your suggestion to add to these docs that this also applies to position adjustments?

Jan 12 '23 17:01 teunbrand

Yes, this is my last suggestion, but I think it would make a lot more sense to not consider position adjustments when censoring out of bounds. Do you consider the censoring in my second example plot to be an expected and wanted behavior? IMHO, lims() is about selecting a range of datapoints, not a range of exact coordinates.

Jan 12 '23 17:01 DanChaltiel

At the point that the data is being censored there is no information anymore what the original position was, so keeping track of that is not straightforward (but not a reason to not do it). Mentioning position adjustments in the docs is a lot easier in comparison.

I would expect the point to be censored because that is what ?lims tells me to expect. If I would want to keep out-of-bounds observations I'd use the coord_cartesian(xlim = ...) argument, or explicitly state scale_x_continuous(limits = ..., oob = scales::oob_keep).

Whether it is wanted ties in with intent, and that can vary from one plot to the next. In the example, there is a continuous x-axis whereas dodging is typically done on discrete x-axes. On discrete axes, definitely the observation should be kept. On continuous axes, it is more of a mixed bag to understand what the intent is. I think defaulting to a literal interpretation of the instructions is the right way to go.

Jan 12 '23 18:01 teunbrand

Indeed, coord_cartesian(xlim = ...) does the trick. I think this would be worth mentioning in the doc, but rather in ?position_dodge(). For what it's worth, this is where I would look for this information.

I think keeping track of the original position would make a lot of sense. As a non-expert user, I'd expect ?position_dodge() to change where the point appears, but not to actually change its coordinates. But, as you say, this would probably require a lot of work that might not be worth it.

Note that the original coordinates might be somewhere to be found, as running plotly::ggplotly() allows us to show them on hover.

Jan 12 '23 18:01 DanChaltiel

ggplot2 ggplot2 copied to clipboard

Datapoints disapearing when `lims()` conflicts with `position_dodge()`

ggplot2
ggplot2 copied to clipboard