ggplot2 icon indicating copy to clipboard operation
ggplot2 copied to clipboard

Move call to st_transform to transform

Open thomasp85 opened this issue 7 years ago • 14 comments

The current implementation of CoordSf has the crs transformation implemented in the setup_data() method. This is, in my view the wrong place to have it and a departure from the other Coord where coordinate transformations happens at the very end inside the Geom$draw_*() method using Coord$transform().

Rather than just being a little inconsistency, it has implications for gganimate and how it might let an animation change crs parameters during an animation.

If this is a change you'd accept I'd be happy to make a PR

thomasp85 avatar Jul 05 '18 09:07 thomasp85

Sure

hadley avatar Jul 05 '18 13:07 hadley

@edzer before I begin poking at this: Was there technical reason for putting the transform this early or did it just seem right?

thomasp85 avatar Jul 05 '18 13:07 thomasp85

@thomasp85 I was not involved in where this happens, and know too little about ggplot2 internals to say something intelligible about it.

edzer avatar Jul 05 '18 13:07 edzer

This may have implications for setup_panel_params() and range() (not currently implemented in coord_sf(), but see PR #2832).

Also, currently (after application of PR #2832), geom_hline() etc. only work as expected in longlat coordinates. When we transform (either the dataset or via coord_sf()), the line falls in the wrong location and doesn't follow the non-linear coordinate system. This is different from the behavior of coord_map(). Ideally, the transform() function in coord_sf() would recognize that the data geom_hline() provides are not transformed and would transform them accordingly.

library(ggplot2)
library(sf)
#> Linking to GEOS 3.6.1, GDAL 2.1.3, proj.4 4.9.3
nc <- sf::st_read(system.file("shape/nc.shp", package = "sf"))
#> Reading layer `nc' from data source `/Library/Frameworks/R.framework/Versions/3.5/Resources/library/sf/shape/nc.shp' using driver `ESRI Shapefile'
#> Simple feature collection with 100 features and 14 fields
#> geometry type:  MULTIPOLYGON
#> dimension:      XY
#> bbox:           xmin: -84.32385 ymin: 33.88199 xmax: -75.45698 ymax: 36.58965
#> epsg (SRID):    4267
#> proj4string:    +proj=longlat +datum=NAD27 +no_defs

ggplot(nc) + geom_sf() + geom_hline(yintercept = 35)

ggplot(nc) + geom_sf() + geom_hline(yintercept = 35) + coord_sf(crs = 5070)

ggplot(nc) + geom_sf() + geom_hline(yintercept = 1500000) + coord_sf(crs = 5070)

Created on 2018-08-16 by the reprex package (v0.2.0).

clauswilke avatar Aug 17 '18 03:08 clauswilke

@thomasp85 I think the implementation problem is going to be that by the time the data reaches the transform() function the crs info has been stripped. Not sure if it is possible to keep it around. It would be really good if it was possible to do so.

library(ggplot2)
library(sf)
#> Linking to GEOS 3.6.1, GDAL 2.1.3, proj.4 4.9.3
nc <- sf::st_read(system.file("shape/nc.shp", package = "sf"))
#> Reading layer `nc' from data source `/Library/Frameworks/R.framework/Versions/3.5/Resources/library/sf/shape/nc.shp' using driver `ESRI Shapefile'
#> Simple feature collection with 100 features and 14 fields
#> geometry type:  MULTIPOLYGON
#> dimension:      XY
#> bbox:           xmin: -84.32385 ymin: 33.88199 xmax: -75.45698 ymax: 36.58965
#> epsg (SRID):    4267
#> proj4string:    +proj=longlat +datum=NAD27 +no_defs
nc_layer <- layer_data(ggplot(nc) + geom_sf())

# original data is of class "sf", has crs info
inherits(nc, "sf")
#> [1] TRUE
# layer data generated by ggplot is not of class "sf", doesn't have crs info
inherits(nc_layer, "sf")
#> [1] FALSE

head(nc)
#> Simple feature collection with 6 features and 14 fields
#> geometry type:  MULTIPOLYGON
#> dimension:      XY
#> bbox:           xmin: -81.74107 ymin: 36.07282 xmax: -75.77316 ymax: 36.58965
#> epsg (SRID):    4267
#> proj4string:    +proj=longlat +datum=NAD27 +no_defs
#>    AREA PERIMETER CNTY_ CNTY_ID        NAME  FIPS FIPSNO CRESS_ID BIR74
#> 1 0.114     1.442  1825    1825        Ashe 37009  37009        5  1091
#> 2 0.061     1.231  1827    1827   Alleghany 37005  37005        3   487
#> 3 0.143     1.630  1828    1828       Surry 37171  37171       86  3188
#> 4 0.070     2.968  1831    1831   Currituck 37053  37053       27   508
#> 5 0.153     2.206  1832    1832 Northampton 37131  37131       66  1421
#> 6 0.097     1.670  1833    1833    Hertford 37091  37091       46  1452
#>   SID74 NWBIR74 BIR79 SID79 NWBIR79                       geometry
#> 1     1      10  1364     0      19 MULTIPOLYGON (((-81.47276 3...
#> 2     0      10   542     3      12 MULTIPOLYGON (((-81.23989 3...
#> 3     5     208  3616     6     260 MULTIPOLYGON (((-80.45634 3...
#> 4     1     123   830     2     145 MULTIPOLYGON (((-76.00897 3...
#> 5     9    1066  1606     3    1197 MULTIPOLYGON (((-77.21767 3...
#> 6     7     954  1838     5    1237 MULTIPOLYGON (((-76.74506 3...

head(nc_layer)
#>                         geometry PANEL group      xmin      xmax     ymin
#> 1 MULTIPOLYGON (((-81.47276 3...     1    -1 -84.32385 -75.45698 33.88199
#> 2 MULTIPOLYGON (((-81.23989 3...     1    -1 -84.32385 -75.45698 33.88199
#> 3 MULTIPOLYGON (((-80.45634 3...     1    -1 -84.32385 -75.45698 33.88199
#> 4 MULTIPOLYGON (((-76.00897 3...     1    -1 -84.32385 -75.45698 33.88199
#> 5 MULTIPOLYGON (((-77.21767 3...     1    -1 -84.32385 -75.45698 33.88199
#> 6 MULTIPOLYGON (((-76.74506 3...     1    -1 -84.32385 -75.45698 33.88199
#>       ymax linetype alpha stroke
#> 1 36.58965        1    NA    0.5
#> 2 36.58965        1    NA    0.5
#> 3 36.58965        1    NA    0.5
#> 4 36.58965        1    NA    0.5
#> 5 36.58965        1    NA    0.5
#> 6 36.58965        1    NA    0.5

Created on 2018-08-16 by the reprex package (v0.2.0).

clauswilke avatar Aug 17 '18 03:08 clauswilke

One more post. Actually, the crs info is retained in the geometry column, so that's not a major hurdle. The bigger question is what to do with regular position columns, such as x, y, xmin, xmax, etc. Ideally they would also have to be transformed, but only (x, y) pairs can be transformed using a geospatial coordinate system. Maybe the transformation could be limited to data frames that have exactly one x and one y column (coord_map() seems to do that). That would cover many basic geoms. Also, the coord might need a switch to determine whether or not to transform non-geometry columns, and if so how to interpret the data in those columns. This sounds like a major rework of coord_sf().

clauswilke avatar Aug 17 '18 05:08 clauswilke

I think the sf support needs some love anyway🙂. But the current transform already take a stance on what to do with non-sf layers (ignore them) so your last point is not really specific to moving the transform operation.

thomasp85 avatar Aug 17 '18 08:08 thomasp85

@thomasp85 I think these issues are sufficiently interwoven that it makes sense to tackle them at the same time, or at least think about them at the same time. Note that the current transform() doesn't ignore non-sf layers, it treats them identical to sf layers. It assumes everything comes in a common CRS and then scales everything to the previously identified x and y ranges in that CRS: https://github.com/tidyverse/ggplot2/blob/5f868c598e88c7096896c0c8c37cbd34a3fb8e47/R/sf.R#L300-L315 (The comment in line 307 is misplaced because it essentially applies to the entire function.)

If the geometry data don't arrive in a common CRS, then it becomes a somewhat open question in what CRS the non-geometry data and the panel ranges should arrive. And also, you may have to transform the data twice, once in setup_panel_params() to identify the ranges and generate graticules and once in transform() to draw.

clauswilke avatar Aug 17 '18 14:08 clauswilke

Ah - I was referring to the current implementation in master l, which do ignore it. I personally think it makes sense to treat non-sf layers as lat-long values and transform based on the given crs

You may be right hat this should all be tackled at once. I’ll carve some time out next week to get acquainted with your PR

thomasp85 avatar Aug 17 '18 15:08 thomasp85

I personally think it makes sense to treat non-sf layers as lat-long values and transform based on the given crs

I think it needs to be configurable, if only for backwards compatibility.

clauswilke avatar Aug 17 '18 18:08 clauswilke

I'm fairly sure that the call to st_transform() occurs in CoordSf$setup_data() because it ensures that the scales are trained in the proper coordinate system (I imagine you all know this I just didn't see it mentioned above). It's possible to do this in the Stat instead, transforming only the extent (I do this with rasters in ggspatial).

I've also wondered why non-spatial layers aren't assumed as WGS84...I implemented a stat_spatial_identity() that makes this work for point geometries, but obviously won't work for anything else. https://github.com/paleolimbot/ggspatial/blob/master/R/geom-spatial.R#L38

paleolimbot avatar Jun 17 '19 15:06 paleolimbot

I've also wondered why non-spatial layers aren't assumed as WGS84

I have been thinking about this quite a bit. As far as I can tell, the largest problem is repeated transformations. Ideally, we would hold all coordinates for all layers in WGS84 until draw time. This would mean transforming all spatial layers to WGS84, and then transforming to the desired CRS twice, once to figure out the scale extents and once to actually draw. That seems rather inefficient.

clauswilke avatar Jun 17 '19 15:06 clauswilke

The following is some code I use to train the scales for rasters in ggspatial...basically it creates a grid in the source CRS, projects it, then gets the extent in the destination CRS. If it's decided that the st_transform() call really belongs in Coord$transform(), something like this could be added to StatSf:

proj_grid <- sf::st_make_grid(proj_corners, n = 10, what = "corners")
out_grid <- sf::st_transform(proj_grid, crs = to_crs)
out_bbox <- sf::st_bbox(out_grid)

paleolimbot avatar Jul 07 '19 12:07 paleolimbot

@thomasp85 is this still a barrier for gganimate?

teunbrand avatar Jun 25 '23 15:06 teunbrand

I have played around with this a bit, and it should be doable to move the transformation step. This step is after scale training, so we can't train scales the usual way. Instead, you can train scales on the extent, as Dewey indicated.

However, training on the extent rather than the transformed data will include extraneous points. For example, the data below is st_crs(4326) data projected to st_crs(5070). The points are a grid along the extent in original coordinates.

Under the new transformation rule, the outermost grid points are included in the extent. This lets scales include a rather large part near the bottom where there is no geometry.

Image

I'm not a geospatial expert, so I don't know if this is an acceptable tradeoff.

teunbrand avatar Nov 21 '24 16:11 teunbrand

After discussion with Thomas we agreed to close this issue. The current situation works fine for most purposes and if {gganimate} wants to interpolate between projections it can subclass CoordSf for its own purposes.

teunbrand avatar Dec 03 '24 10:12 teunbrand