cartopy icon indicating copy to clipboard operation
cartopy copied to clipboard

Remove GEOS dependency in favor of Shapely

Open greglucas opened this issue 3 years ago • 2 comments

This removes the GEOS dependency and pushes the geometry handling into Shapely. It currently takes ~2x longer, so opening this for discussion purposes.

ping: @QuLogic, @jorisvandenbossche

For simple benchmarking I've been making the project_linear benchmark executable for quick runs by adding this to the bottom:

if __name__ == "__main__":
    s = Suite()
    p, r = "PlateCarree", "50m"
    s.setup(p, r)
    s.time_project_linear(s, r)

Then profiling

python -m cProfile -o nogeos.prof benchmarks/cases/project_linear.py
snakeviz nogeos.prof

greglucas avatar Sep 14 '22 14:09 greglucas

  1. The most time spent in Shapely seems to be in getting coordinates,

It might be easier to comment inline on a draft PR, but one idea to make this more efficient is to do src_coords = np.asarray(geometry.coords) (inside project_linear) and then further access coordinates in that numpy array, instead of accessing the coordinates from the coords object (since this has a getitem in python that has some overhead).

Good idea! That reduced it by ~20%. But, we are still creating a lot of Points and Linestrings to do the covers test further on, so we need to think about creating the array of all geometries as soon as possible if we can. This would also greatly help the pyproj transforms too...

greglucas avatar Sep 14 '22 14:09 greglucas

I'd be willing to pay a significant speed penalty (with the hopes of getting it back later) in order to remove the headache that is depending on Shapely while simultaneously linking to GEOS.

dopplershift avatar Sep 14 '22 19:09 dopplershift

If it means I don't have to build GEOS, I would take a 2x slowdown any day :)

varchasgopalaswamy avatar Oct 28 '22 00:10 varchasgopalaswamy

After a long delay, I finally got around to pushing up some commits I've had around for a while.

I added a new commit that transforms all of the source points in one call immediately to remove some of the repetitive transform(single_point) calls. However, this still only improves the performance by about 20% or so. Reducing the number of calls from 1.2M to 900K, indicating many of the calls are from within the bisect() routine itself.

If we want to be more like basemap and transform everything immediately and not interpolate between the transformed points, we can get a factor of 2x improvement. See this quick commit I was using for testing: https://github.com/greglucas/cartopy/commit/c052f63687355147dcfcaee6be51e9e8872192be

My project_linear timings were: Removing GEOS and only using Shapely: 9 seconds Transforming the points as early as possible: 8 seconds Removing interpolation/bisection between points: 4 seconds

On the final one where we've removed all interpolation/bisection, the slowest functions are all within Shapely, so we could focus effort in refactoring the shapely geometry handling then.

greglucas avatar Nov 16 '22 03:11 greglucas

I think the bisection/interpolation was an intentional feature of Cartopy, so we shouldn't just blindly remove it. How hard would it be to add an option (somewhere) to disable it?

The 20% sounds gain sounds great, and honestly the win in terms of making this project more sustainable, simplifying packaging, and eliminating a solid 50% of the support questions is worth the current performance impact.

dopplershift avatar Nov 16 '22 18:11 dopplershift

:100: agree on all fronts. I'm hopeful that this can be a short-term hit in performance, but lead to a more sustainable project and getting people to be able to contribute.

I think the bisection/interpolation was an intentional feature of Cartopy, so we shouldn't just blindly remove it. How hard would it be to add an option (somewhere) to disable it?

I agree we should not remove it (I have some ideas for how to do it better still), that commit is just a demonstration of how it could work and the speed we can gain. Thinking more about it, it is analogous to the transform_first keyword argument we added to the contour functions. https://github.com/SciTools/cartopy/blob/8c31a157154291d35dd295ad81f38b80cce78ac9/lib/cartopy/mpl/geoaxes.py#L322-L334 So, I can see adding that decorator to more functions, or even bringing that up a level and adding a property to Projection that would signal us to transform things immediately for the user.

greglucas avatar Nov 16 '22 18:11 greglucas

That seems reasonable.

dopplershift avatar Nov 16 '22 21:11 dopplershift

Any other thoughts from people on this PR and the trade-off of the speed decrease vs ease of installation?

@QuLogic this was mostly your work, so did you have a preference one way or the other on it?

greglucas avatar Nov 23 '22 16:11 greglucas

Just a ping to any interested parties for comment (@QuLogic ) since the release of Shapely 2.0 is imminent.

dopplershift avatar Dec 07 '22 16:12 dopplershift

As another speed-up idea, I am wondering: how many of the projections / CRS classes have a domain that is rectangular (bounding box) ? Because if a significant portion is limited to that case, I think it would be relatively straightforward to have a fast path for that case for some of the shapely interactions (eg checking if a point is in a bbox is easy and could be done without shapely, avoiding the overhead of shapely in this fast path)

jorisvandenbossche avatar Dec 07 '22 23:12 jorisvandenbossche

I like all of these thoughts for speed-ups! My question is do we want to keep implementing the speed-ups before accepting a PR, or do we merge something a bit slower for now and hope that people open PRs with improvements in the future?

My take is that it would be good to get this in and then start whittling away on the speed-ups later (don't let perfect be the enemy of good). There are quite a few speed-ups to be had and I think they can be done in parallel by people with expertise in different areas too.

(I've convinced myself that our bisection routines are bad performance-wise and not true bisection and that is the root of the problem. We go back and forth bisecting between points until a straight-enough segment is found, then set that to the new start or end point and restart bisecting. This means if we've already overshot in a previous iteration, we could try that same point again in the next iteration. I think we should really do recursive bisection adding points as we go down the recursive path avoiding any duplicate project_point + overlaps calls.)

greglucas avatar Dec 08 '22 17:12 greglucas

If someone has plans to immediately implement some speed-ups, I'd entertain waiting. Otherwise, while I'm sure we'll catch some flack for being even slower, I think finally being able to upload wheels and eliminating whole classes of installation issue outweighs that. Maybe it being released slow will motivate some contributions. 😉

dopplershift avatar Dec 08 '22 17:12 dopplershift

Maybe it being released slow will motivate some contributions

One can dream 😄 (that is my hope too)

greglucas avatar Dec 08 '22 17:12 greglucas

I think finally being able to upload wheels

We, conda uses, underestimate the value of this. Cartopy really need wheels ASAP!

ocefpaf avatar Dec 08 '22 18:12 ocefpaf

Latest commit adds the version gate to avoid the create/destroy Point geometry as @jorisvandenbossche suggested. I wasn't able to see much of a speed-up locally with that though.

There is another LineString create/destroy for the covers()/disjoint() calls that is likely still the slow path.

greglucas avatar Dec 08 '22 18:12 greglucas

My question is do we want to keep implementing the speed-ups before accepting a PR

My suggestions were certainly just ideas from looking at the code that I wanted to note, not things that have to be done in this PR (and anyway it's not up to me to decide on that ;))

I would personally also say that being able to drop the build dependency on GEOS will be a huge benefit for installation, and seems worth some (temporary) speed degradation.

jorisvandenbossche avatar Dec 08 '22 21:12 jorisvandenbossche

I've taken a stab at upstreaming some performance updates to pyproj, which would be able to more than offset this (i.e. faster than current main). We are definitely taking a performance hit here, but I still think the benefits are worth it and then we can start whittling away on other performance improvements.

Anyone willing to review/approve and hit the merge button ;) ?

greglucas avatar Dec 16 '22 18:12 greglucas

I'm willing to hit it as soon as I carve out some cycles to actually review the code.

dopplershift avatar Dec 21 '22 18:12 dopplershift

@bjlittle does IRIS have any issues with removing GEOS? Does this slow things down too much, and are there any additional failures that this introduces for all of you?

greglucas avatar Mar 12 '23 16:03 greglucas

@bjlittle does IRIS have any issues with removing GEOS? Does this slow things down too much, and are there any additional failures that this introduces for all of you?

We've got no direct need for geos on SciTools/iris :+1:

bjlittle avatar Mar 12 '23 17:03 bjlittle

upload wheels

I'm happy to take that on... as long as I wouldn't be stepping on anyone's toes?

bjlittle avatar Mar 12 '23 17:03 bjlittle

@bjlittle does IRIS have any issues with removing GEOS? Does this slow things down too much, and are there any additional failures that this introduces for all of you?

@greglucas We also have asv benchmarking enabled on SciTools/iris, so we'll soon see whether there are any performance regressions.

@trexfeathers Would it be possible to benchmark against this PR pre-merge? That might be pretty informative. Fancy making that happen and reporting the outcome, if you have capacity?

bjlittle avatar Mar 12 '23 23:03 bjlittle

Does anyone have other thoughts on this PR? What performance penalty is the maximum we would take for easier installations, or what else we would need to consider to keep this moving forward?

greglucas avatar Jun 02 '23 01:06 greglucas

As a user I would say < 2x performance hit is more than acceptable given the ease of installation. Also, once it's out in the field it'll be easier to notice where the performance hits are really mattering, right?

varchasgopalaswamy avatar Jun 02 '23 12:06 varchasgopalaswamy

Does anyone have other thoughts on this PR? What performance penalty is the maximum we would take for easier installations, or what else we would need to consider to keep this moving forward?

In all honesty, given the resources we have for maintaining this project, I think the only requirement I have is that it still works. This is a huge win for packaging and maintainability.

dopplershift avatar Jun 07 '23 22:06 dopplershift

OK, some updated numbers for everyone then.

Running the benchmark from the initial comment (project_linear), I get 1.85s on upstream/main and 2.35s on this branch. So, much less than a factor of 2 even. Curiously, it is a bit slower on shapely 2.0 compared to 1.8.5, which probably means there are even more speed-ups to be had if we look at refactoring some of the array coordinate usage throughout the library.

Running the test suite locally I get within a second of each other. On CI it looks like there is ~50% difference between this branch and the most recent merge to main. (Look at the testing time explicitly because the installation time takes forever) main: https://github.com/SciTools/cartopy/actions/runs/5205628895/jobs/9391295722 this branch: https://github.com/SciTools/cartopy/actions/runs/5217703945/jobs/9417803995

greglucas avatar Jun 09 '23 02:06 greglucas

Sounds more than acceptable to me !

varchasgopalaswamy avatar Jun 09 '23 03:06 varchasgopalaswamy

In all honesty, given the resources we have for maintaining this project, I think the only requirement I have is that it still works. This is a huge win for packaging and maintainability.

It looks like I do have the ability to merge without any approvals. With the lack of people and time to review this PR I think we should push forward and merge this since we have new numpy, scipy, and matplotlib releases recently that it would be nice to push an update out for.

I'll give this 2 more weeks and someone can throw a request changes to block me from self-merging if they have strong opinions against that, otherwise since no one has spoken up adamantly against this I think it is time to move forward with getting this in.

greglucas avatar Jun 25 '23 21:06 greglucas