Minimizing incompatibility between cartopy and shapely
Cartopy has a module that both imports Shapely and adds new GEOS bindings: https://github.com/SciTools/cartopy/blob/master/lib/cartopy/trace.pyx. I think this is an unsound approach that can lead innocent cartopy users into DLL Hell.
I'd like to propose two better approaches.
Move all of cartopy's GEOS dependency into Shapely
The extension code in trace.pyx is rather small and could be moved to Shapely and deleted from cartopy. The DLL Hellmouth would be closed and cartopy users could benefit from binary Shapely wheels. There's still time to do this before Shapely 1.6.0.
Vendorize Shapely
Cartopy could copy the Shapely source and add build steps for it into setup.py, removing it from the list of dependencies that would be resolved by pip or setuptools. The internal copy of Shapely would be built using the same geos_c as cartopy's trace module and the DLL Hellmouth would thus be closed.
@QuLogic @pelson I see in other tickets here that conda is what you're focusing on. Neither of my proposed approaches are need for conda, though moving the extension code into Shapely will have the benefit of removing calls into Shapely's private internal API and thus making cartopy more forward compatible with Shapely. I think that's probably the better approach for making pip install cartopy work on OS X.
👍 to upstreaming things.
I really only use Fedora packages (which are almost always in sync), and only point to conda because it's the only one I know of that works on Macs. I'm all for moving things into shapely if that's the better location for things.
Howdy, for the past month or so I've been working on a high level-plotting package which extends cartopy and uses geopandas for input. The DLL issues stopped development very early in the process, but then I was somehow (through plodding around enough I suppose) able to arrive at a Mac OSX environment in which everything "just worked".
Fast forward a month later, and geoplot is (at least initially) done (some examples are in the docs). Returning to the platform issue, however, I was not even able to reproduce the working environment I had generated earlier! I can't distribute the codebase, because I don't know of a way of getting all of the components installed successfully.
There are a few other complications, but this issue is the core of it. So I'm obviously interested in helping to resolve it. I could use some direction from you folks in doing so, since I have significantly more time than experience.
To begin with, right now I'm trying to get the shapely and cartopy libraries built from source in the same environment and passing all of their tests. @QuLogic how do you run your cartopy tests? Running nosetests in the command line (post-python setup.py install) errors out for me with ImportError: No module named cartopy._crs.
@ResidentMario I don't usually rebuild my environment, but when I do, I basically follow the steps from .travis.yml (except for not re-downloading conda.)
I think it's time to discuss this more--it's now impossible to build cartopy on travis (linux) against the 1.5.17 manylinux wheels that have been uploaded. In fact, I have to work harder to get pip to ignore those wheels.
GIven that shapely is a full-fledged wrapper around GEOS, I think the sane solution is to add any APIs that cartopy needs to Shapely and eliminate cartopy's direct dependence on GEOS. Can someone on the CartoPy side weigh in on whether such a move would be supported? I'm willing to put in the work needed, but can only afford to do so if the effort won't go wasted.
I don't see a patch doing what you've just described being rejected, given the concurring opinions of a cartopy core dev and the fact that it was originally proposed by the core shapely maintainer.
Thanks @dopplershift - I'm in favour of doing what we can to improve ease of use and compatibility, and it is great you are keen to help.
I think a good starting point would be to record here what the main changes will be for cartopy (i.e. what moves from cartopy to shapely, what needs to change in cartopy to accomodate this) and what (if any) will be the differences from an end-user perspective.
Has there been any progress on this?
Edit: There's a licensing incompatibility, see shapely #451 (again).
License incompatibility. Of course there is.
I've used the GPL before and have nothing against it, but it does mean that Shapely cannot lift the code from Cartopy. We'd need the copyright holder, British Crown Copyright 2011 - 2016, Met Office, to donate a new version of trace.pyx with a permissive license.
I only hold the (L)GPL against people who assert that it doesn't create any problems--ones like this for instance.
That's not really something that's unique to (L)GPL, so let's not get off topic here. At least we have one advantage with the CLA here; there's only one "person" to ask about re-licensing. But I think first we need to figure out what part.
From a quick scan, all of the Met Office projects seem to be GPL or LGPL. So I think this is just their policy on things. They can do that because of UK crown copyright, AFAIK all GitHub code published by e.g. the NOAA in the US is unlicensed by default because federal law compels most work released by federal agencies to be public domain.
@QuLogic It seems we just want them to release trace.pyx, no?
I've used the GPL before and have nothing against it, but it does mean that Shapely cannot lift the code from Cartopy. We'd need the copyright holder, British Crown Copyright 2011 - 2016, Met Office, to donate a new version of trace.pyx with a permissive license.
i'll pick this up and get it looked into
in principal, I see no problems, I expect it is just some administration, but I will report back
@sgillies @ResidentMario is https://github.com/Toblerity/Shapely/issues/451 in your view the set of changes to make, or is that just part of the work?
thank you
hello @sgillies
I have looked into this and there are no issues with relicensing trace.pyx and moving it to shapely https://github.com/Toblerity/Shapely/pull/479
Regarding Licensing @QuLogic : you are the only contributor not already covered by the copyright, so approval from you to relicense your code from LGPL to BSD is required in order for this code to be donated. A comment on this Issue and that PR is sufficient
Is Cartopy not covered by the CLA I signed?
Never mind, I forgot that was an LGPL-only grant. I agree with this re-licensing for Shapely.
I've overlooked that _trace.pyx uses both GEOS and PROJ.4 APIs. I think a refactor would be needed before the GEOS calling code could be moved to Shapely, and I don't want to insist on such a refactor for Cartopy. Vendorizing Shapely may be the best option after all, yes?
Hi @sgillies,
Sorry for the radio silence on this 📻 . I've been somewhat pre-occupied with fluffy animals down-under recently 🐨 🙃 (I have been seconded in Australia for 6 months). I'm now back the right way up, and I'm keen to solve this issue once and for all.
trace.pyx is a combination of GEOS and PROJ.4 - its responsibility is to produce interpolated geometries in their target projection. A good deal of the code was written some time ago, and although has held up well, is in some need of love. As I recall, the reason we ended up using GEOS directly was that there were a number of bottlenecks in Shapely's geometry creation (even in Cython) that we were able to significantly optimise by making prior assumptions about the incoming coordinate sequences.
I'd like to split the conversation at this point into two threads: what we do in the short term; and what cartopy does in the long term.
What we do in the short-term?
The fact that Shapely's wheels bundle GEOS, and that cartopy depends on both shapely and GEOS means that cartopy MUST either:
- be using exactly the same version of GEOS at build and run time as is used in the shapely wheels
- vendor its own version of Shapely so that it is decoupled from the GEOS version being used by the official Shapely wheels
Unfortunately we are in this situation because bundling third party dependencies in a binary format designed to contain a single package, such as is done with binary wheels, takes away a means for us to express dependencies fully. This is not a criticism of Shapely having done so - it is the only solution that is currently available to us for distributing binary packages through pip, and is the reason I continue to advocate the use of conda or a user's system package manager (the pip interface is excellent, and there are downsides to conda too, it is just that PythonPackaging doesn't deal with the concept of non-python packages well IMO).
Anyway, this is not intended to be a rant (and I definitely don't want to criticise what I actually believe is the best tool for the job when it comes to pure python packages) - we are where we are, and I'm keen to find a sustainable solution.
At this point, though it is my least favourite option, I believe vendoring shapely is the less brittle of the two options. The one question I have is whether the use of two potentially different versions of GEOS (the one from cartopy.shapely and the one from shapely) is likely to see shared symbol issues for GEOS functions? (shapely isn't statically linked in the wheels, right?)
What cartopy does in the long-term?
In the long term I can see a few feasible options:
- cartopy feeds-back any optimisation steps back into Shapely, rather than circumventing it, and uses a Cython level Shapely APIs exclusively (downside: long term we might want to be looking at things like numba for our optimisations, which means low level calls to GEOS (or a numba shaped shapely) is going to be pretty important)
- GEOS is shipped as its own wheel, and shapely depends upon that. It would enable cartopy to express its dependency on (specific versions of) both shapely and GEOS, and pip can do its magic on resolving the dependencies correctly.
- the trace part of cartopy if hived off into its own package, and it makes no use of shapely whatsoever - GEOS stuff only. Cartopy could then depend on it and shapely at the Python level [if the option of cartopy vendoring shapely is going to work, so would this].
At this point, [naturally] I prefer the idea of moving the optimisations out of cartopy and into Shapely. Not entirely unrelatedly, I'm expecting to be putting a significant effort into the trace code in the next 6-12 months in order to solidify cartopy for a major release. I think doing this will inform us some more about the direction of travel for cartopy in the long-term.
This is somewhat longer than my normal comments, hope you're still with me! Would love to get your feedback/thoughts.
Cheers,
Isn't vendoring shapely simply going to shift the problem onto cartopy users who are using shapely manually? Wouldn't this:
import cartopy
import shapely
still only load one copy of the geos library, which will break one of those two?
Thanks for putting my question more succinctly @dopplershift. That is the key question we need to be able to answer, otherwise at least one (but probably both cartopy and shapely) will need to vendor GEOS as well (and have to do unpleasant things with its namespace). Nobody wants to be doing that though, so we should get a concrete grip/PoC on answering that question.
@marqh already opened Toblerity/Shapely#479 which should be the long-term solution; he just didn't finish cleaning it up. Is that not enough?
In geopandas we are currently working on a GeometryArray object to be able to have fast vectorized shapely operations on an array of geometries (see https://github.com/geopandas/geopandas/issues/473 and https://github.com/geopandas/geopandas/issues/430). So we will also have C extensions using GEOS and also shapely.
Therefore, just want to say that I am very interested in how this discussion goes along.
If it's any incentive, this issue is a primary blocker on potentially bundling geoplot as a viz suite in geopandas. :)
@jorisvandenbossche's datapoint is an interesting one - whilst I can see things moving towards shapely as the one-true GEOS wrapper, it is unsustainable for shapely to subsume functionality that isn't its core capability. With that in mind, I'm moving towards the idea that the healthiest long-term solution is for shapely's binary wheels to either statically link against GEOS, or to bundle a namespace mangled GEOS that doesn't expose third-party libraries to shapely's version.
On linux, I believe this is precisely what is already happening (through auditwheel?). On OSX, I'm not so sure. @sgillies - are you using delocate at all when producing the wheels?
@pelson yes, I'm using delocate for the macosx wheels. It doesn't mangle the dylib name as auditwheel does, or didn't as of version 0.6.
Static GEOS isn't going to work for Shapely because it relies on ctypes and dlopen.
solution is for shapely's binary wheels [..] to bundle a namespace mangled GEOS that doesn't expose third-party libraries to shapely's version.
Question from a novice in these matters: that doesn't give problems when geopandas still uses shapely? Eg when we box a GEOSGeometry object stored in a geopandas series (for which we work on the pointers with 'our' GEOS) to a shapely geometry?
I guess another option would be to come up with a third python library that just wraps GEOS and exposes the necessary low-level C++ API via cython. Both shapely and cartopy could separately depend on this new library. Not sure if that would necessarily be less work, since you'd need to refactor both shapely and cartopy, but at least you'd avoid making shapely the defacto python GEOS wrapper if shapely doesn't want to do that.
Oops, that PR was not supposed to close this issue; it was only a cross-reference.
As a followup to #1251, I have a branch that converts to Shapely instead of GEOS directly, but as I recall, it was quite a bit slower. Perhaps using pygeos might work better.