cartopy icon indicating copy to clipboard operation
cartopy copied to clipboard

Major slowdown of app after upgrading to cartopy 0.20 that uses pyproj

Open andhuang-CLGX opened this issue 3 years ago • 32 comments

I am still investigating but I found that somewhere in my app, the projecting is taking 97% of the load time.

 97.00%  97.00%   322.7s    322.8s   __init__ (pyproj/crs/crs.py:313)
  1.00%   1.00%   14.25s    14.25s   equals (pyproj/crs/crs.py:933)

Will update this as I figure out where it's coming from.

Some context, I am using geoviews, datashader to plot millions of points. I am not sure if geoviews / datashader is using an outdated cartopy method to transform the points; will test.

andhuang-CLGX avatar Oct 01 '21 14:10 andhuang-CLGX

Seems like holoviews / geoviews image

andhuang-CLGX avatar Oct 01 '21 15:10 andhuang-CLGX

Are you setting PYPROJ_GLOBAL_CONTEXT=ON? See the note on the new docs about that speed-up: https://scitools.org.uk/cartopy/docs/latest/index.html

However, this looks like a lot of pyproj CRS objects are being created, rather than re-using the same object which would be causing the slowdown. I wonder if we should look at putting a cache on the CRS constructors in addition to the Transformers?

# Many calls like this
for i in range(100):
    ax.scatter(..., transform=ccrs.PlateCarree())

# rather than storing the object
pc = ccrs.PlateCarree()
for i in range(100):
    ax.plot(..., transform=pc)

greglucas avatar Oct 01 '21 15:10 greglucas

This is a minimal example:

import geoviews as gv
import cartopy.crs as ccrs
gv.extension("bokeh")

gv.tile_sources.CartoDark() * gv.Points(([-88, -95, -100], [40, 50, 60]), crs=ccrs.PlateCarree())

This takes 24 seconds to run on cartopy==0.20, and less than 1 second (0:00:00.183278) on cartopy==0.19.0

andhuang-CLGX avatar Oct 01 '21 15:10 andhuang-CLGX

I met the same question.

leeyupeng119 avatar Oct 08 '21 14:10 leeyupeng119

@leeyupeng119 and @andhuang-CLGX: Have you tried setting PYPROJ_GLOBAL_CONTEXT=ON in the environment, like @greglucas suggested, and see if that improves things?

dopplershift avatar Oct 08 '21 18:10 dopplershift

Yep I have tried that. https://github.com/holoviz/geoviews/issues/529#issuecomment-934661990 geoviews may implement their own solution in the near future.

andhuang-CLGX avatar Oct 08 '21 20:10 andhuang-CLGX

Hi All

this seems like a pretty major performance hit on the 0.19 0.20 transition

does it seem like this is an edge case? is this a performance hit that could be seen (oom) for many cases?

Has any consideration has been given to providing access to the proj transforms for users, enabling them to opt out of the pyproj transforms if they desire?

I'm a bit worried that the documentation note:

The v0.20 release uses pyproj for transformations, which could be slower in some situations. doesn't provide sufficient information given the magnitude of performance hit seen here

I feel that there is more that should be considered here than only PYPROJ_GLOBAL_CONTEXT=ON

many thanks marqh

marqh avatar Oct 14 '21 14:10 marqh

I'm happy to consider those kinds of performance improvements, but any such addition needs to consider the maintainability and sustainability of the project. We have a minimal set of maintainers on this project. Updating to the new PROJ API languished for years until Cartopy's inability to work with current versions of PROJ became such a problem and headache for the community writ large that @snowman2 stepped in to do the update.

So bluntly, without a significant and trusted guarantee that whoever adds direct builds against PROJ will stick around and maintain it long term, I can't see how merging it would be in the best interest of the project. I think the right solution is to figure out where the bottlenecks are and add optimizations (or new APIs) to pyproj.

dopplershift avatar Oct 15 '21 20:10 dopplershift

Here are performance tips identified in the pyproj development:

  1. Using the global context for creating several PROJ objects ref can improve performance. However, it is not thread safe. So, pyproj makes you enable it if you know your application does not use threading. That is why the PYPROJ_GLOBAL_CONTEXT=ON recommendation exists for cartopy.
  2. PROJ 8.1+ has speedups. If you aren't using it yet, I recommend upgrading. https://github.com/OSGeo/PROJ/pull/2738#issuecomment-855329515, https://github.com/OSGeo/PROJ/pull/2787.
  3. Repeated transformations ref. This is why there is now a Transformer class in pyproj.

snowman2 avatar Oct 16 '21 00:10 snowman2

i agree with @dopplershift and the comments on the importance of maintainability

i would also like to recognise the contributions of @snowman2 in delivering the updates to get the code base working with up to date proj

i think that it would be worthwhile exploring potential benefit opportunity, whilst keeping in mind that detailed optimisation tends to lead to complexity which is a maintainability headache.

i wonder what practical steps could be identified to see if this performance can be improved without major code base maintenance issues.

  • For example, is the pattern highlighted by @greglucas https://github.com/SciTools/cartopy/issues/1895#issuecomment-932321937 of creating many transform objects:
    • [ ] embedded in the Cartopy codebase?
      • could we identify where and make point wise changes?
    • [ ] mainly a usage pattern?
      • could we enhance the usage documentation with some tips on performance improvements?
    • [ ] Is the notion of caching within CRS constructors a pattern we could implement as a managed change set?
    • [ ] Would it be useful to include a set of performance metrics on transforms in the tests, to give some benchmarks for any proposed performance enhancements?
      • e.g. example from @andhuang-CLGX https://github.com/SciTools/cartopy/issues/1895#issuecomment-932323055

i'd like to help with this endeavour, but i'll state up front that I'm not about to add direct builds against PROJ and commit to long term maintenance of said builds myself

many thanks all marqh

marqh avatar Oct 19 '21 08:10 marqh

@marqh, investigations into the slowdown would be most welcome!

One of the biggest areas that could use a speed-up would be the bisection routine. https://github.com/SciTools/cartopy/blob/72216fd3d68aaf3fb693327ff88baa032a0170db/lib/cartopy/trace.pyx#L441 Calling pyproj's Transform with a single point over and over is not ideal. So, thinking of a way to avoid that iteration, or speed it up and call the transform with 4-10 points each time as a look-ahead, may be worthwhile.

For caching the CRS constructor, I am not sure whether that should be implemented on Cartopy's side, or if a PR to pyproj would be the better location for it. You'd likely need to override __new__ to return from a global cache stored somewhere. It seems doable if the same parameters are passed in to the constructor, but that could be optimistic thinking too.

There are already performance metrics that can be run in the benchmarks folder: https://github.com/SciTools/cartopy/tree/master/benchmarks those can be run with asv. We aren't currently running them on a CI system, but perhaps something could be set up to check for major regressions with it automated.

greglucas avatar Oct 20 '21 16:10 greglucas

@leeyupeng119 and @andhuang-CLGX: Have you tried setting PYPROJ_GLOBAL_CONTEXT=ON in the environment, like @greglucas suggested, and see if that improves things?

Yes, I have try, but still slow.

leeyupeng119 avatar Oct 25 '21 03:10 leeyupeng119

I'm digging at this general issue too now and find that PYPROJ_GLOBAL_CONTEXT=ON is not making any difference in the run-time. For an interesting note, my current debugging script goes from a run-time of 4 seconds to 11 seconds just by adding cfeature.LAND to the plot. That's a bit puzzling and where I am currently looking.

akrherz avatar Oct 27 '21 16:10 akrherz

@akrherz I find an even larger performance penalty by adding cfeature.OCEAN (esp. with 10m res and transforming to another CRS), but this pre-dates 0.20.

ktyle avatar Oct 27 '21 16:10 ktyle

@ktyle Yeah, I realized that the ocean is not a necessary feature to draw. Just make the axes background the ocean color and then draw the land, which is much less expensive to do.

So I think I just reproduced what @greglucas stated above, the segment-by-segment reprojection in trace.pyx is what is very slow and what causes cfeature.LAND to be so slow, given the large number of segments it has.

akrherz avatar Oct 27 '21 16:10 akrherz

@akrherz, can you see if PR https://github.com/SciTools/cartopy/pull/1918 helps your use-case at all?

greglucas avatar Nov 12 '21 15:11 greglucas

@akrherz, can you see if PR #1918 helps your use-case at all?

Sorry for the delay, sadly not much.

import cartopy
import cartopy.crs as ccrs
import cartopy.feature as cfeature
import matplotlib.pyplot as plt

fig = plt.figure()

ax = fig.add_subplot(111, projection=ccrs.Mercator())
feature = cfeature.NaturalEarthFeature(
    name="ocean",
    category="physical",
    scale="10m",
    edgecolor="#000000",
    facecolor="#AAAAAA",
)
ax.add_feature(feature)
fig.savefig('test.png')

~13.8 seconds to ~13.4 seconds.

akrherz avatar Nov 16 '21 20:11 akrherz

Just to report the output of a tiny benchmark, comparing a projection creation and a projection comparison with a simple string that is expected to return False:

  • 0.19.0.post1: image

  • 0.20.1: image

In this case, the creation of a projection is about 25x slower and the equality test is 3,500,000x slower.

maximlt avatar Nov 30 '21 20:11 maximlt

It's unclear from your example if there's something else going on. You run p = cartopy.crs.PlateCarree(), but your timeit is checking proj.

dopplershift avatar Nov 30 '21 20:11 dopplershift

@dopplershift indeed this might have been misleading. I updated the screenshots in my previous message. The outcome is still the same.

maximlt avatar Nov 30 '21 21:11 maximlt

@maximlt, the updated method of PROJ is more robust when comparing CRS and has many different methods to create a CRS (PROJ string, Authority, WKT, PROJ JSON, etc...). Since proj is not a valid input to create a CRS, PROJ will exhaust all possibilities to create a CRS for you every time you perform a comparison in the timeit operation. I would imagine if you were able to successfully create a cartopy.crs.CRS beforehand for comparison purposes, it would be more efficient.

snowman2 avatar Nov 30 '21 21:11 snowman2

image

snowman2 avatar Nov 30 '21 21:11 snowman2

While the microbenchmark definitely agrees with the hotspots identified by @andhuang-CLGX's profiling, I think more detail at what holoviews is doing is necessary to see how equals and __init__ to know exactly what's causing it to be slow.

dopplershift avatar Nov 30 '21 21:11 dopplershift

Thanks for your replies and explanations! I'm well aware that using pyproj is an improvement and reported this small benchmark here for others to be aware of that, since geoviews (actually holoviews) might not be the only library impacted by this change.

maximlt avatar Dec 01 '21 07:12 maximlt

I was going to post a new issue but there's reasonable chance mine is related to this.

I'm trying to place markers on a simple world map and am finding the map takes a very long time (about 30s) to zoom in on, say, Great Britain, using Matplotlib's standard zoom tool (the magnifying glass icon). This happens in both the Robinson or Mercator projections. Here's the plot where I'm trying to zoom in:

import matplotlib.pyplot as pl
import cartopy

ax = pl.subplot(1, 1, 1, projection=cartopy.crs.Robinson())
ax.set_global()

ax.add_feature(cartopy.feature.OCEAN, zorder=0)
ax.add_feature(cartopy.feature.LAND, zorder=0, edgecolor='black')

pl.show()

I'm running Fedora 35 and installed geos 3.9.2 and proj 8.2.0 through the default repos. I have Python 3.10.1 and installed cartopy with python3 -m pip install cartopy --user, which has given me version 0.20.1. From my matplotlibrc, I think I'm using the TkAgg backend for matplotlib, which is version 3.5.1.

I'll keep experimenting to see if anything jumps out at me. This might be related to #1895 but PYPROJ_GLOBAL_CONTEXT=ON didn't help and I got build errors when downgrading to cartopy<0.20.

Full environment definition

Operating system

Fedora 35

Cartopy version

0.20.1

pip list

$ pip list
Package            Version    Location
------------------ ---------- --------------------------------
appdirs            1.4.4
argcomplete        1.12.3
astropy            5.0
astroquery         0.4.5
autograd           1.3
backcall           0.2.0
Beaker             1.10.0
beautifulsoup4     4.9.3
blivet             3.4.2
blivet-gui         2.3.0
bokeh              2.4.2
Brlapi             0.8.2
Brotli             1.0.9
cached-property    1.5.2
Cartopy            0.20.1
certifi            2021.10.8
cffi               1.14.6
chardet            4.0.0
charset-normalizer 2.0.4
chrome-gnome-shell 0.0.0
click              8.0.1
corner             2.2.1
cryptography       3.4.7
cupshelpers        1.0
cycler             0.10.0
Cython             0.29.24
dasbus             1.6
dbus-python        1.2.18
decorator          5.0.9
distro             1.6.0
emcee              3.1.1
fbpca              1.0
fedora-third-party 0.8
fonttools          4.26.1
fros               1.1
fs                 2.4.11
future             0.18.2
gpg                1.15.1
h5py               3.4.0
html5lib           1.1
humanize           0.5.1
idna               3.2
importlib-metadata 4.10.0
ipython            7.30.1
jedi               0.18.1
jeepney            0.7.1
Jinja2             3.0.3
joblib             1.1.0
keyring            23.5.0
kiwisolver         1.3.2
langtable          0.0.56
libcomps           0.1.18
lightkurve         2.0.11
lxml               4.6.3
Mako               1.1.4
MarkupSafe         2.0.0
matplotlib         3.5.1
matplotlib-inline  0.1.3
memoization        0.4.0
munkres            1.1.2
ndsplines          0.1.2
nftables           0.1
numpy              1.21.5
oktopus            0.1.2
olefile            0.46
packaging          21.0
pandas             1.3.5
parso              0.8.3
Paste              3.5.0
patsy              0.5.2
pexpect            4.8.0
pickleshare        0.7.5
pid                2.2.3
Pillow             8.3.2
pip                21.2.3
ply                3.11
productmd          1.33
prompt-toolkit     3.0.24
ptyprocess         0.6.0
pwquality          1.4.4
pycairo            1.20.1
pycparser          2.20
pycrypto           2.6.1
pycups             2.0.1
pycurl             7.44.1
pyenchant          3.2.2
pyerfa             2.0.0.1
Pygments           2.11.1
PyGObject          3.42.0
pykickstart        3.34
pyOpenSSL          21.0.0
pyparsing          2.4.7
pyparted           3.11.7
pyprof2calltree    1.4.5
pyproj             3.3.0
PyQt5              5.15.6
PyQt5-sip          12.9.0
pyshp              2.1.3
PySocks            1.7.1
python-augeas      1.1.0
python-dateutil    2.8.1
python-meh         0.50
pytz               2021.3
pyudev             0.22.0
pyvo               1.2
pyxdg              0.27
PyYAML             6.0
regex              2021.11.10
requests           2.26.0
requests-file      1.5.1
requests-ftp       0.3.1
rpm                4.17.0
scikit-learn       1.0.2
scipy              1.7.3
SecretStorage      3.3.1
selinux            3.3
sepolicy           3.3
setools            4.4.0
setuptools         57.4.0
Shapely            1.8.0
simpleaudio        1.0.4
simpleline         1.8.2
six                1.16.0
slip               0.6.4
slip.dbus          0.6.4
sos                4.2
soupsieve          2.3.1
SSSDConfig         2.6.1
systemd-python     234
Tempita            0.5.2
threadpoolctl      3.0.0
tornado            6.1
tqdm               4.62.3
traitlets          5.1.1
typing_extensions  4.0.1
ultranest          3.3.3
uncertainties      3.1.6
urllib3            1.26.6
wcwidth            0.2.5
webencodings       0.5.1
wheel              0.36.2
zipp               3.7.0

warrickball avatar Jan 07 '22 20:01 warrickball

I tried running my script through cProfile by opening the plot, zooming and then pressing the close button, so that the program exited as soon as the zoom (on a region of northern Europe, including e.g. North Sea and Baltic Sea) was finished. Not sure if it might be helpful. The full terminal output is attached here and the snippet below is the first few lines (until tottime < 1s).

209131486 function calls (208964453 primitive calls) in 170.533 seconds

   Ordered by: internal time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
 11193149   27.966    0.000   30.714    0.000 coords.py:61(__iter__)
     9112   25.372    0.003   28.662    0.003 linestring.py:229(geos_linestring_from_py)
    49192    9.034    0.000   11.600    0.000 coords.py:164(xy)
  3583410    8.797    0.000   15.593    0.000 {method '_transform' of 'pyproj._transformer._Transformer' objects}
  7166820    8.612    0.000   17.505    0.000 utils.py:88(_copytobuffer)
    63741    7.517    0.000   25.296    0.000 polygon.py:438(geos_linearring_from_py)
  3583410    6.864    0.000   44.053    0.000 transformer.py:649(transform)
  1017607    6.083    0.000   19.258    0.000 coords.py:76(__getitem__)
    14552    6.048    0.000   50.662    0.003 {cartopy.trace.project_linear}
  7166636    4.858    0.000    4.858    0.000 utils.py:66(_copytobuffer_return_scalar)
 33325097    4.201    0.000    4.201    0.000 {built-in method _ctypes.byref}
 37087951    4.061    0.000    4.063    0.000 {built-in method builtins.isinstance}
  2184301    3.451    0.000   11.461    0.000 coords.py:43(_update)
  2350772    3.287    0.000    5.213    0.000 predicates.py:23(__call__)
  7166820    2.539    0.000    2.539    0.000 utils.py:138(_convertback)
        1    2.522    2.522  168.790  168.790 {built-in method exec}
  2350715    2.464    0.000    8.470    0.000 base.py:709(is_empty)
       19    2.256    0.119   62.185    3.273 crs.py:955(_attach_lines_to_boundary)
 11078928    2.145    0.000    3.286    0.000 linestring.py:258(_coords)
  1116973    2.047    0.000    7.965    0.000 coords.py:51(__len__)
  3609193    1.916    0.000    2.628    0.000 enum.py:359(__call__)
  3583413    1.907    0.000    4.507    0.000 enums.py:13(create)
 14394652    1.675    0.000    3.156    0.000 {built-in method builtins.hasattr}
  3583412    1.552    0.000    1.552    0.000 transformer.py:313(_transformer)
      250    1.490    0.006    1.490    0.006 {method 'read' of '_ssl._SSLSocket' objects}
  9464100    1.274    0.000    1.274    0.000 base.py:228(_geom)
1486562/1465361    1.105    0.000    1.123    0.000 base.py:245(__setattr__)
  6494596    1.028    0.000    1.028    0.000 {method 'append' of 'array.array' objects}
...

warrickball avatar Jan 08 '22 14:01 warrickball

This shapely issue might be related to my issue, though it also makes me suspect that my issue isn't related to the original one in this thread. The runtime of my example decreased to ~12s when I installed shapely from the GitHub repo following the example in this comment:

pip install --verbose --no-binary=shapely "git+https://github.com/shapely/[email protected]#egg=shapely"

Happy to open a new issue if mine is independent.

warrickball avatar Jan 08 '22 20:01 warrickball

I realized that the ocean is not a necessary feature to draw. Just make the axes background the ocean color and then draw the land, which is much less expensive to do. @akrherz

I've also had numerous problems with Cartopy 0.20.1/Shapely 1.8 suddenly being unbearably slow after I recently upgraded to them. Plotting the terrain field for a large WRF domain of 1062 x 512 points (dx = 1 km) suddenly took nearly 11m 30s. That's unacceptably slow. With my prior version of Cartopy (0.16, I think??), it took "only" 7 min, which was still far too slow to be useful. (I'm running this on NCAR's Cheyenne, on a head node.)

I set PYPROJ_GLOBAL_CONTEXT=ON, but that had little effect on the speed for this plot. (I noticed more of a speed-up for other plots I tried of different data, though.)

I then used the transform_first argument to plt.contourf ([https://scitools.org.uk/cartopy/docs/latest/gallery/scalar_data/contour_transforms.html]), and that resulted in a noticeable speed-up, to 10m 45s.

However, commenting out the call to draw the oceans (ax.add_feature(oceans)), but instead assigning my ocean color using ax.set_facecolor, resulted in a SIGNIFICANT speed-up, all the way down to 12 s. If I eliminated the transform_first argument again, it slowed down to just over 1 min, so transform_first is still quite helpful.

So for a temporary workaround to anyone experiencing similar major slowdowns with Cartopy 0.20+, hopefully implementing these steps will help:

  1. Don't draw the oceans unless they're actually necessary (MAJOR speed-up).
  2. Use the transform_first keyword in the contourf call (moderate speed-up).
  3. Also try PYPROJ_GLOBAL_CONTEXT=ON for a potential speed-up (in some situations, but not all).

I also very much look forward to Cartopy 0.20.2 release to eliminate the ShapelyDeprecationErrors (#1936) that also started happening constantly once I upgraded.

jaredalee avatar Jan 12 '22 05:01 jaredalee

However, commenting out the call to draw the oceans (ax.add_feature(oceans)), but instead assigning my ocean color using ax.set_facecolor, resulted in a SIGNIFICANT speed-up, all the way down to 12 s.

Thanks @jaredalee! I found this very helpful, reducing the time to zoom in on my point map from ~30s to about ~3s.

If anyone else is curious, Cartopy's water colour is np.array((152, 183, 226)) / 256.:

https://github.com/SciTools/cartopy/blob/591fb5450e11b42b6de1cebe4f240112f915bd52/lib/cartopy/feature/init.py#L22-L24

warrickball avatar Jan 12 '22 09:01 warrickball

Thanks for posting these results! I think this points to the need for some documentation around potential speedups that people can try all in one place rather than spread throughout the GitHub Issues.

From my reading of everything, it seems like the major issue is the oceans feature and not land or coastlines? This may mean we should look into optimizing polygon edge closing with the boundaries.

You should be able to get the colors programmatically from that location too: cartopy.feature.COLORS["water"] https://scitools.org.uk/cartopy/docs/latest/reference/generated/cartopy.feature.COLORS.html#cartopy.feature.COLORS

greglucas avatar Jan 12 '22 14:01 greglucas