geopandas icon indicating copy to clipboard operation
geopandas copied to clipboard

[Feature Request] automatically reproject to WGS84 if saving a file as GeoJSON

Open stevenlis opened this issue 6 years ago • 14 comments

geopandas: '0.5.0'

I had this projection issue as well before: https://stackoverflow.com/questions/56806907/gis-shapefile-converted-to-geojson-has-unexpected-coordinate-format

As GeoJSON only supports WGS84. does it make more sense to automatically reproject the GeoDataFrame being saved as a GeoJSON file? and also of course, give users a warning

https://macwright.org/2015/03/23/geojson-second-bite.html

stevenlis avatar Jun 29 '19 14:06 stevenlis

I don't really like automatic reprojection. It can cause issues if your CRS is not properly set, if you intentionally want to use different CRS... I would not decide on behalf of users what to do with their data. You can always do df.to_crs(epsg=4326).to_file("file.json", driver="GeoJSON")

However, raising a warning makes sense though.

martinfleis avatar Jun 30 '19 11:06 martinfleis

I disagree, I think most people want to export their geodataframe to geojson without having to know trivia like epsg:4326 is the only valid geojson projection. Users don’t get to use their own crs for geojson (according to the spec), so if their frame is not projected, we should assign 4326, and if it is projected we should be converting their data to the correct projection so that the output is valid with a warning that the tool ‘had to reproject from to epsg:4326’.

austinorr avatar Jun 30 '19 16:06 austinorr

Btw, is there any update regarding GeoJSON in geopandas? I remembered before that if I save a file as GeoJSON and then read it back again and check the .crs, it would return None or nothing, but know the projection is actually saved? image

stevenlis avatar Jun 30 '19 18:06 stevenlis

If that pattern is still true (read geojson has no .crs) then I think we should choose one of the following options: either to respect the specification of geojson and assume the epsg:4326; or go for explicit > implicit and try to read the geojson projection tag if it exists, else assign None. Either way we should also add a test to dump to geojson and reload to see if it preserves the crs correctly.

austinorr avatar Jun 30 '19 19:06 austinorr

Even though EPSG:4326 is the expected standard, you can lose quite a bit of accuracy converting from one projection to another. Additionally, in the GeoJSON spec it also states other CRS can be used:

However, where all involved parties have a prior arrangement, alternative coordinate reference systems can be used without risk of data being misinterpreted.

Since GDAL/OGR supports saving GeoJSON to different CRS, I think following their lead is probably the best way to go.

But, I wouldn't be opposed to raising a warning if you save to a projection that is not EPSG:4326.

snowman2 avatar Jun 30 '19 20:06 snowman2

ok...If GeoJSON actually supports storing crs information, then I think a warning is fine.

stevenlis avatar Jun 30 '19 20:06 stevenlis

@snowman2 I think it's worth pointing out that OGC urn:ogc:def:crs:OGC::CRS84 is the official system for GeoJSON, not EPSG 4326. Using the latter as shorthand for the former is going to be problematic in the future where systems based on GDAL 2 and GDAL 3 need to coexist, yes?

The sentence in the GeoJSON spec that you referenced means to acknowledge that projected data which happens to be structured like GeoJSON is often useful in some contexts. Rasterio's rio-shapes command lets a user output GeoJSON-like data for features extracted from a raster dataset in the projection of the raster, for the purpose of piping the data into rio-rasterize (for example) and not suffering any degradation of coordinates. It's not even necessary to put the CRS in the data for these applications as the CRS is communicated out of band. That's what "prior arrangement" means.

The sentence does not say that it's correct to write projected GeoJSON-like data, add a "crs" object to it, and call it "GeoJSON" with the expectation that all GeoJSON readers can understand it.

Summary: GeoPandas ought to let users export projected GeoJSON-like data for their own needs. A warning when doing so might be appropriate. Putting a CRS object on the GeoJSON like GDAL has done is no longer correct and reduces interoperability.

sgillies avatar Jun 30 '19 21:06 sgillies

Putting a CRS object on the GeoJSON like GDAL has done is no longer correct and reduces interoperability.

Is this something that should also change on the fiona side (it is fiona that lets GDAL write a GeoJSON file with a non-WGS84 crs) ? Or do you see it the responsibility of the user of fiona to not provide a crs in such a case. It would feel a bit strange for geopandas to add a special case checking for GeoJSON in to_file before passing the data to fiona (to not pass through the crs in that case).

jorisvandenbossche avatar Jun 30 '19 22:06 jorisvandenbossche

@sgillies, the clarification and the example of a non-standard use case were very useful.

I think it's worth pointing out that OGC urn:ogc:def:crs:OGC::CRS84 is the official system for GeoJSON, not EPSG 4326.

You are definitely correct, thanks for pointing this out :+1:.

I did some digging to better understand what you were saying:

>>> from pyproj import CRS
>>> urn_crs = CRS("urn:ogc:def:crs:OGC::CRS84")
>>> urn_crs
<Geographic 2D CRS: OGC:CRS84>
Name: WGS 84 (CRS84)
Axis Info [ellipsoidal]:
- Lon[east]: Geodetic longitude (degree)
- Lat[north]: Geodetic latitude (degree)
Area of Use:
- name: World
- bounds: (-180.0, -90.0, 180.0, 90.0)
Datum: World Geodetic System 1984
- Ellipsoid: WGS 84
- Prime Meridian: Greenwich

>>> urn_crs.to_epsg()
>>> urn_crs.to_authority()
('OGC', 'CRS84')
>>> epsg_crs = CRS("epsg:4326")
<Geographic 2D CRS: EPSG:4326>
Name: WGS 84
Axis Info [ellipsoidal]:
- Lat[north]: Geodetic latitude (degree)
- Lon[east]: Geodetic longitude (degree)
Area of Use:
- name: World
- bounds: (-180.0, -90.0, 180.0, 90.0)
Datum: World Geodetic System 1984
- Ellipsoid: WGS 84
- Prime Meridian: Greenwich
>>> urn_crs == epsg_crs
False

The main difference is the axis order. It appears that the OCG CRS is the same as +init=epsg:4326 due to the axis order, however the +init=epsg:4326 syntax is deprecated in GDAL 3+/PROJ6+.

>>> deprecated_crs = CRS("+init=epsg:4326")
ProjDeprecationWarning: '+init=<authority>:<code>' syntax is deprecated. '<authority>:<code>' is the preferred initialization method.
  ProjDeprecationWarning,
>>> deprecated_crs == urn_crs
True

Putting a CRS object on the GeoJSON like GDAL has done is no longer correct and reduces interoperability.

Although it is not officially supported, I think having some type of CRS information in the files is very useful when it is not urn:ogc:def:crs:OGC::CRS84 for non-standard use cases and it would be unfortunate to lose it. Is the main concern that it is ignored in most other programs and could cause confusion?

snowman2 avatar Jul 01 '19 00:07 snowman2

@snowman2 and @sgillies, thanks for setting me straight with the CRS84 and epsg:4326. In looking into the standard I had wrongly concluded that they were equivalent and missed the coordinate order inversion. Thanks for the examples to clearly illuminate the difference.

In rereading my response above I also see that I was terse and a bit stubborn in insisting that there’s only one valid projection; with a ‘prior arrangement’ one should be able to serialize data from any projection they please, and that’s a feature not a bug. My terseness was born from having been burned by projection guesswork before, and so my inclination is to favor a system that at least leaves a record and eliminates the guessing — but the spec basically says not to use a crs tag anymore...

Still not sure what is the best solution, but I do agree with others that raising a warning that ‘the file is not in the CRS84 projection and there could be ambiguity when reloading the file.’

austinorr avatar Jul 01 '19 21:07 austinorr

Note that CRS84 or epsg:4326 are (for better or worse) in practice equivalent within geopandas, as we currently always store data as (x, y), regardless of the order prescribed by the CRS.

jorisvandenbossche avatar Jul 01 '19 22:07 jorisvandenbossche

Just want to mention this really tripped me up recently. All of our data is read in from GDB files that have a CRS explicitly set and we re-save them as geojson just after reading them in for compatibility reasons. We were working on trying to load some of these geojson files into Snowflake recently, which strictly follows the GeoJSON spec. We only realized that our data wasn't automatically re-projected into WGS84 when being saved as geojson, because Snowflake started throwing invalid coordinates errors.

The points presented above are well taken, but I do think that even if a warning isn't presented when reading/writing the files, a note in the GeoPandas docs about this situation would be really helpful to anyone in a similar position to myself and scratching their head before they stumble upon this thread.

The folks on my team are huge fans of GeoPandas and really appreciate all the hard work put into the library, so thank you!

ZeroCool2u avatar Jul 26 '22 18:07 ZeroCool2u

Hi @ZeroCool2u, thanks for the feedback on this. We are hopefully (finally) going to merge #416 in an upcoming release which will will add to_wgs84 as a keword argument to to_json and the default behaviour (to_wgs84=None) will be not to project, warn that this is not compliant with the GeoJSON spec and indicate that in future the default behaviour in geopandas will be to project to WGS84.

m-richards avatar Jul 29 '22 10:07 m-richards