spherely icon indicating copy to clipboard operation
spherely copied to clipboard

Add geometry (scalar) creation functions

Open benbovy opened this issue 1 year ago • 5 comments

This PR refactors the Geography (sub)classes in spherely and adds geography creation scalar functions.

Supersedes #26. Closes #41.

  • Goal is to remove the Geography subclasses (Point, Polygon, etc.): only one Geography wrapper class remains
  • Geography type (and also whether the geography is empty or not) is inferred at object creation
  • Geography objects are created via the top level creation functions (only scalar for now except for points)

TODO:

  • [ ] fix segfault when creating a geography collection itself including collection(s)
  • [ ] add EMPTY geographies creation function overloads (+ tests errors when trying to build MULTI geographies from EMPTY geographies)

benbovy avatar Oct 04 '24 13:10 benbovy

Yes I've been working further on this (last updates not uploaded yet) but that might be too much of a change indeed. I'll restrict this PR to the refactoring of Geography wrapper classes.

Also I'm not exactly sure whether we should closely reuse Shapely's API for creation functions. I find that API convoluted with the mixing of a unique coords argument vs. separate x, y (and z) arguments as well as the different behavior depending on indices. We should probably reuse the same API for compatibility although it makes things more cumbersome to implement.

benbovy avatar Oct 09 '24 13:10 benbovy

Yeah, the vectorized creation functions in shapely are quite complex, and I am also not sure if they are always that useful (like creating polygons from a single 3D array, which I think in practice is only useful to generate some random triangles or to create bboxes), and agreed we should carefully think about which API we exactly want here. That's also the reason I mentioned leaving out the vectorized creation functions for now, and only providing the scalars ones (that already exist and that are used in the tests). For points, I would keep the vectorized function you already have, since in this case it is quite simple, I think? (and you already have that)

jorisvandenbossche avatar Oct 09 '24 13:10 jorisvandenbossche

For points, I would keep the vectorized function you already have, since in this case it is quite simple, I think? (and you already have that)

Yes. For a single array this is currently limited to 2D (i.e., not a true ufunc like in shapely) but that's fine for now. I don't think that we can use pybind11.vectorize to extend support to a single nD input array (https://github.com/pybind/pybind11/issues/1294).

I wouldn't be too hard I think to mimic the behavior of a ufunc by relying on pybind11's array view & indexing capabilities (+ some manual computation of nD indices), though. That would be even easier to let xtensor(-python) takes care of all of that (also for creating temporary arrays of geographies on the C++ side while releasing the GIL), but I'm not sure if adding such build dependencies is worth it.

benbovy avatar Oct 09 '24 13:10 benbovy

Nice progress! One thought on the API: given that we don't follow exact shapely anymore with the subclass constructors (shapely.Point(..) etc), we could also follow the naming of s2 and bigquery and others for the scalar creation functions, and use something like make_point() instead of point(), etc. (although I see they are also not consistent in it, as they have st_geogpoint and st_makeline/st_makepolygon ...)

(I am not necessarily convinced myself that this is better (it's longer to type, but tab completion will put them all together, and there is a clearer distinction from being a type), mostly just mentioning it in case you would have a stronger opinion)

jorisvandenbossche avatar Oct 11 '24 15:10 jorisvandenbossche

Thanks for the review comments @jorisvandenbossche.

I also have been thinking about the naming... Likewise I'm still not fully convinced myself and I don't have strong opinions either.

We could also use the create_ prefix for what we call "geography creation" functions...

benbovy avatar Oct 11 '24 21:10 benbovy

This is getting close. I'd just like to investigate the segfault when creating nested geography collections before making this PR ready (EDIT: fixed segfault).

A few other comments that we can address later:

  • Empty geographies: there is no overload function to create MULTI geographies without argument (or with None) ; also when empty or 1-length sequences are given the multi-types are demoted (see https://github.com/paleolimbot/s2geography/issues/17)
    • I guess the issue is the same in #50 and maybe #51 ?
    • We could address this in Spherely by forcing the geography type in the spherely.Geography wrapper, but it depends on what we want to do.
  • The collection type is named GeographyType::GeographyCollection but both the Geography repr and WKT have the GEOMETRYCOLLECTION name. Do we want to uniformize things and rename GeographyType::GeographyCollection to GeographyType::GeometryCollection?
  • Full polygon not supported (yet)
  • It is not currently possible to create a MULTIPOLYGON with duplicate rings, due to the fact that s2geography::PolygonGeography reuse the same underlying S2Polygon for multi-polygons. Maybe if we deactivate s2 normalization and/or validity check it will pass but will it work ?
  • No test added for vectorised point creation functions, but the API may not be final.
  • Scalar creation function naming: still no strong opinion, although I would lean towards either point(), linestring(), etc. or create_point(), create_linestring(), etc.

benbovy avatar Oct 15 '24 14:10 benbovy

This is getting close.

A few other comments that we can address later:

Agreed! Let's indeed get this in (to have the basics working, and will also be easier for other PRs adding features), and the things below can be handled in follow-ups.

Personally I am fine with that limitation for now, especially given that s2geography essentially does not make the distinction between single and multi geometries anyway (as you linked), and that it is only for output (like WKT, or asking for the type) that the distinction is introduced, based on the number of elements stored in the object.

  • The collection type is named GeographyType::GeographyCollection but both the Geography repr and WKT have the GEOMETRYCOLLECTION name. Do we want to uniformize things and rename GeographyType::GeographyCollection to GeographyType::GeometryCollection?

I would maybe lean towards GeographyType::GeometryCollection, although that is certainly odd as well ..

Another case where this occurs is in the spherely.geography_collection(..) constructor. For that case, we could also shorten it to just collection() to avoid that issue? (or at least when we add a prefix like create_.., that might be sufficient)

  • It is not currently possible to create a MULTIPOLYGON with duplicate rings, due to the fact that s2geography::PolygonGeography reuse the same underlying S2Polygon for multi-polygons. Maybe if we deactivate s2 normalization and/or validity check it will pass but will it work ?

A multipolygon with duplicate rings also sounds like "wrong" .. (in shapely / GEOS, that would be an invalid geometry anyway, and thus give unexpected results), so I am personally fine with not allowing to create it.

Although I suppose at some point we should allow to create invalid geographies, end then have a method to check if they are valid / what the reason is for the invalidity (for example s2_make_polygon has a check keyword to disable validation when constructing a polygon)

This is also another case (together with the geometrycollection / geographycollection above) where it points the the mismatch between the "simple features" model we try to emulate and how s2 works. For the constructor functions you implemented here, it is nice to make the distinction between all the types, I think (as it gives a more user friendly interface), but for accessors like the geography type, we could also stay more closely to s2geography and have a function that just returns point/line/polygon/collection

  • Scalar creation function naming: still no strong opinion, although I would lean towards either point(), linestring(), etc. or create_point(), create_linestring(), etc.

Let's open a separate issue for this and see if we can get some more opinions

jorisvandenbossche avatar Oct 18 '24 13:10 jorisvandenbossche

I would maybe lean towards GeographyType::GeometryCollection, although that is certainly odd as well ..

Another case where this occurs is in the spherely.geography_collection(..) constructor. For that case, we could also shorten it to just collection() to avoid that issue?

I renamed GeographyType::GeographyCollection -> GeographyType::GeometryCollection and geography_collection() to collection(), I also think it looks nicer for now.

benbovy avatar Oct 21 '24 15:10 benbovy

OK, merged this, so we can move forward with other PRs and follow-ups. Thanks @benbovy!

jorisvandenbossche avatar Oct 22 '24 08:10 jorisvandenbossche

👍 thanks for the reviews @jorisvandenbossche !

benbovy avatar Oct 22 '24 09:10 benbovy