uxarray icon indicating copy to clipboard operation
uxarray copied to clipboard

Matplotlib polycollection plots for global domains plot bad data after rasterization update

Open mkavulich opened this issue 1 month ago • 9 comments

Version

v2025.09.0

How did you install UXarray?

Conda

What happened?

Between v2025.05 and v2025.06 (when raster image support was released), something was broken in the polycollection method of plotting with matplotlib integration. I have made a minimal reproducible example script, using Python 3.12.11 (though I have seen this with earlier Python versions as well), though I'm not sure how to include the data.

When plotting with 2025.05, things look good:

Image

When plotting with 2025.09, things look bad:

Image

And when I add the argument periodic_elements='split' to uxda.to_polycollection(), the code fails entirely:

Traceback (most recent call last):
  File "/glade/work/kavulich/MPAS/plotting/plotting_errors/2025.09/mpas_plot/MRE.py", line 51, in <module>
    plotit(dataset[var].isel(Time=0),var,0,file,ftime_dt)
  File "/glade/work/kavulich/MPAS/plotting/plotting_errors/2025.09/mpas_plot/MRE.py", line 16, in plotit
    pc=uxda.to_polycollection(periodic_elements='split')
       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/glade/work/kavulich/MPAS/plotting/plotting_errors/2025.09/mpas_plot/conda/envs/mpas_plot/lib/python3.12/site-packages/uxarray/core/dataarray.py", line 293, in to_polycollection
    poly_collection = self.uxgrid.to_polycollection(**kwargs)
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/glade/work/kavulich/MPAS/plotting/plotting_errors/2025.09/mpas_plot/conda/envs/mpas_plot/lib/python3.12/site-packages/uxarray/grid/grid.py", line 2278, in to_polycollection
    ) = _grid_to_matplotlib_polycollection(
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
TypeError: uxarray.grid.geometry._grid_to_matplotlib_polycollection() got multiple values for keyword argument 'periodic_elements'

Interestingly, these problems don't appear to affect regional MPAS domains, so I suspect that the issue has to do with periodic/global data specifically. I have observed the same issue with 2025.06 so I believe that's when the problem was introduced. The data file is too large to upload here but it is available on NCAR GLADE at the above path; however, this should be reproducible with any global/periodic MPAS data you have lying around. Let me know if you have any questions or I can provide any more info.

What did you expect to happen?

Pretty plots, as shown above.

Can you provide a MCVE to repoduce the bug?

#!/usr/bin/env python3
import os
import time
from datetime import datetime

import matplotlib.pyplot as plt
import cartopy.feature as cfeature
import cartopy.crs as ccrs

import uxarray as ux

def plotit(uxda: ux.UxDataArray,var: str,lev: int,filepath: str,ftime) -> None:
    plotstart = time.time()

    #pc=uxda.to_polycollection(periodic_elements='split')
    pc=uxda.to_polycollection()
    pc.set_antialiased(False)
    pc.set_cmap('viridis')

    fig, ax = plt.subplots(1, 1, figsize=(8,4), dpi=300, constrained_layout=True,
                           subplot_kw=dict(projection=ccrs.PlateCarree()))

    ax.add_feature(cfeature.NaturalEarthFeature(category='cultural', scale='50m',edgecolor='k',facecolor='none',linewidth=0.2, name='admin_0_countries'))
    ax.add_feature(cfeature.NaturalEarthFeature(category='physical',color='k',facecolor='none',
                           linewidth=0.5, scale='10m', name='coastline'))

    pc.set_transform(ccrs.PlateCarree())

    coll = ax.add_collection(pc, autolim=True)
    ax.autoscale()

    plt.title('Plot of 2-meter temperature for MPAS forecast, 2025-05-14 00:00:00',fontsize=8)

    cbar = plt.colorbar(coll,ax=ax,orientation='vertical')
    cbar.set_label('Units: {units}', fontsize=8)

    plt.savefig('example.png',format='png')
    plt.close(fig)
    print(f"Done saving 'example.png'. Plot generation {time.time()-plotstart} seconds")


if __name__ == "__main__":

    file='/glade/work/kavulich/MPAS/tutorial/120km_global/rundir/history.2025-05-14_00.00.00.nc'
    var='t2m'
    timestring='2025-05-14_00:00:00'
    ftime_dt = datetime.strptime(timestring.strip(), "%Y-%m-%d_%H:%M:%S")
    dataset=ux.open_dataset(file,file)

    plotit(dataset[var].isel(Time=0),var,0,file,ftime_dt)

mkavulich avatar Oct 10 '25 00:10 mkavulich

Can you remove the pc.set_transform(ccrs.PlateCarree()) line and see if that fixes things? We switched to using ccrs.Geodetic() as our transform, which uses Cartopy's splitting for periodic elements (hence why periodic_elements) is not used.

https://github.com/UXARRAY/uxarray/blob/460eb23e72aaa9c8b617870de7e9f6b2c8939d8f/uxarray/grid/grid.py#L2332

We added some logic to safegaurd against duplicate parameters in https://github.com/UXARRAY/uxarray/pull/1351, but looks like we missed the periodic_elements parameter.

https://github.com/UXARRAY/uxarray/blob/460eb23e72aaa9c8b617870de7e9f6b2c8939d8f/uxarray/grid/grid.py#L2307-L2320

philipc2 avatar Oct 10 '25 00:10 philipc2

@philipc2 Thanks for the quick response! Making this change appears to result in a different set of bad data:

Image

I see that the new transform automatically stitches the period elements as well, which means that plotting time is more than doubled. Is there no way to disable this in the new versions for quicker plots?

mkavulich avatar Oct 10 '25 01:10 mkavulich

Thanks for the quick response! Making this change appears to result in a different set of bad data

That's interesting, it looks like there's a mismatch somewhere with the indices.

Is there no way to disable this in the new versions for quicker plots

I'd suggest using to_raster() if you are looking for quicker plots. We only really suggest using the PolyCollection for small/exploratory visualizations.

philipc2 avatar Oct 10 '25 01:10 philipc2

def plotit_raster(uxda: ux.UxDataArray) -> None:
    plotstart = time.time()

    fig, ax = plt.subplots(1, 1, figsize=(8,4), dpi=300, constrained_layout=True,
                           subplot_kw=dict(projection=ccrs.PlateCarree()))

    ax.add_feature(cfeature.NaturalEarthFeature(category='cultural', scale='50m',edgecolor='k',facecolor='none',linewidth=0.2, name='admin_0_countries'))
    ax.add_feature(cfeature.NaturalEarthFeature(category='physical',color='k',facecolor='none',
                           linewidth=0.5, scale='10m', name='coastline'))

    ax.set_global()

    raster = uxda.to_raster(ax=ax)
    img = ax.imshow(
        raster, cmap="viridis", origin="lower", extent=ax.get_xlim() + ax.get_ylim()
    )

    ax.autoscale()

    plt.title('Plot of 2-meter temperature for MPAS forecast, 2025-05-14 00:00:00',fontsize=8)

    cbar = plt.colorbar(img ,ax=ax,orientation='vertical')
    cbar.set_label('Units: {units}', fontsize=8)

    plt.savefig('example.png',format='png')
    plt.close(fig)
    print(f"Done saving 'example.png'. Plot generation {time.time()-plotstart} seconds")

Can you give this a try? This should be much faster. On my local machine, I was able to plot a 30km grid in about 5s (initial plot is a bit slower due to KDTree construciton, but subsequent ones are much faster)

Image

philipc2 avatar Oct 10 '25 01:10 philipc2

If you'd like to increase the resolution, we can set a pixel_ratio parameter too to sample more pixels.

https://uxarray.readthedocs.io/en/latest/user-guide/mpl.html#controlling-the-resolution

philipc2 avatar Oct 10 '25 01:10 philipc2

I'd suggest using to_raster() if you are looking for quicker plots. We only really suggest using the PolyCollection for small/exploratory visualizations.

Okay thanks, I suspected that it was deprecated but wanted to make sure. The intent is to move to to_raster() but I want to be able to have some overlap for ensuring reproducibility of all features.

Thanks for your help, I was able to plot with to_raster() and get most of the previous plotting features working! I'll let you know if I run into any further issues.

mkavulich avatar Oct 10 '25 22:10 mkavulich

FWIW, I have noticed the same thing. On 2025.5.0, things were good, 2025.10.0, things are bad with to_polycollection. I originally thought it was due to pinning numpy to <2.3, but building from source doesn't solve the issue. Based on this ticket, the 2025.6.0 release contains the pain point.

I have found that to_raster() is actually slower apples-to-apples than to_polycollection() for my VR SE mesh.

Using to_polycollection() (2025.5.2):

(uxarray) E2-MET-WKML070:ux_dev cmz5202$ time python make-mov-frames.py 
NumPy version (from inside uxarray): 2.3.4
Found 1 data files
Will process 1 files total
Output frames will be saved to: ./frames_Irene/
--------------------------------------------------
Processing file 1/1: 15min_snap.INSTANT.nmins_x15.2011-08-27-00000.nc
  Found 4 time steps in this file
  Processing time step 1/4
    Warning: 100.0% of data is NaN
    Skipping frame due to excessive NaN values
  Processing time step 2/4
    Debug - timestamp type: <class 'numpy.ndarray'>, value: 2011-08-27 00:15:00
    Min: 85.19, Max: 338.16
    Timestamp: 2011-08-27 00:15 UTC
    Saved: ./frames_Irene/irene_201108270015.png
  Processing time step 3/4
    Debug - timestamp type: <class 'numpy.ndarray'>, value: 2011-08-27 00:30:00
    Min: 84.25, Max: 337.98
    Timestamp: 2011-08-27 00:30 UTC
    Saved: ./frames_Irene/irene_201108270030.png
  Processing time step 4/4
    Debug - timestamp type: <class 'numpy.ndarray'>, value: 2011-08-27 00:45:00
    Min: 85.14, Max: 336.53
    Timestamp: 2011-08-27 00:45 UTC
    Saved: ./frames_Irene/irene_201108270045.png
--------------------------------------------------

Processing complete!
Total frames created: 3
Frames skipped (NaN): 1
Frames saved to: ./frames_Irene/

real	1m37.543s
user	1m35.224s
sys	0m1.900s

Using to_raster() (2025.10.0)

(uxarray) E2-MET-WKML070:ux_dev cmz5202$ time python make-mov-frames.py 
NumPy version (from inside uxarray): 2.3.4
Found 1 data files
Will process 1 files total
Output frames will be saved to: ./frames_Irene/
--------------------------------------------------
Processing file 1/1: 15min_snap.INSTANT.nmins_x15.2011-08-27-00000.nc
  Found 4 time steps in this file
  Processing time step 1/4
    Warning: 100.0% of data is NaN
    Skipping frame due to excessive NaN values
  Processing time step 2/4
    Debug - timestamp type: <class 'numpy.ndarray'>, value: 2011-08-27 00:15:00
    Min: 85.19, Max: 338.16
    Timestamp: 2011-08-27 00:15 UTC
    Saved: ./frames_Irene/irene_201108270015.png
  Processing time step 3/4
    Debug - timestamp type: <class 'numpy.ndarray'>, value: 2011-08-27 00:30:00
    Min: 84.25, Max: 337.98
    Timestamp: 2011-08-27 00:30 UTC
    Saved: ./frames_Irene/irene_201108270030.png
  Processing time step 4/4
    Debug - timestamp type: <class 'numpy.ndarray'>, value: 2011-08-27 00:45:00
    Min: 85.14, Max: 336.53
    Timestamp: 2011-08-27 00:45 UTC
    Saved: ./frames_Irene/irene_201108270045.png
--------------------------------------------------

Processing complete!
Total frames created: 3
Frames skipped (NaN): 1
Frames saved to: ./frames_Irene/

real	2m34.384s
user	7m8.248s
sys	0m2.832s

Also, it is worth noting that to_raster() seems to essentially populate a 2D x,y mesh. However, this causes one to lose the face-integrated structure of the unstructured grid. See these two examples.

to_polycollection()

Image

to_raster()

Image

The top is the desired behavior given the cubed-sphere grid.

It's possible I could recover a better raster image with a higher pixel density, but given that my benchmarks have to_raster() to be a bit slower, probably via kd tree building, I don't think that's an optimal solution.

zarzycki avatar Oct 30 '25 19:10 zarzycki

@mkavulich @zarzycki thanks a lot for the feedback. We'll prioritize fixing these issues that occurred on to_polyCollection()

erogluorhan avatar Oct 30 '25 19:10 erogluorhan

I did some more digging on this yesterday. I've found something that I think partly explains my issue but perhaps not @mkavulich -- but since they may be related, I'll leave this here.

After 2025.5.2, the ability to pass a projection when creating a polycollection was removed.

Ex:

pc = data.to_polycollection(projection=projection)

no longer works. The code throws a warning and then does not include the projection.

After more testing, I found that my PolyCollection code actually does work with 2025.10.0 but runs very slowly (approximately 10 times slower). This appears to be because the polycollection generated by uxarray needs to be reprojected during the plotting step in cartopy/mpl.

This slowdown can be verified in 2025.5.2 by running a block of code with and without the projection arg in to_polycollection() for a large mesh (e.g., a 3km MPAS mesh). I unfortunately deleted my timings, but it was something like 10 minutes for a single global image with no projection passed in to t_pc and 1.2 minutes with projection passed in to t_pc. I am not sure why the plotting is faster when projecting the polycollection at generation vs. at plot time -- I assume some vectorized operations somewhere...

I have been able to recover the "faster" plotting in 2025.10.0 by applying these changes in grid.py, which permits me to pass in a projection arg, which is used.

@@ -2292,6 +2292,7 @@ class Grid:
 
     def to_polycollection(
         self,
+        projection=None,
         **kwargs,
     ):
         """Constructs a ``matplotlib.collections.PolyCollection``` consisting
@@ -2326,10 +2327,9 @@ class Grid:
             poly_collection,
             corrected_to_original_faces,
         ) = _grid_to_matplotlib_polycollection(
-            self, periodic_elements="ignore", projection=None, **kwargs
+            self, periodic_elements="exclude", projection=projection, **kwargs
         )
 
-        poly_collection.set_transform(ccrs.Geodetic())
         self._cached_poly_collection = poly_collection
 
         return copy.deepcopy(poly_collection)

Note, if periodic_elements=ignore (as is the default in 2025.10.0), the projection in _grid_to_matplotlib_polycollection still happens, but then it isn't used, so I forced it to periodic_elements=exclude, which reproduces the previous behavior.

HOWEVER, I noticed something else weird,

which may be related to @mkavulich -- when my data includes antimeridian points, the indexing gets jumbled:

grid_path = '../Philadelphia_TC_grid_v2_ne128x8_pg2_SCRIP.nc'
data_path = '../Irene/15min_snap.INSTANT.nmins_x15.2011-08-28-54000.nc'
variable = 'LW_flux_up_at_model_top'
uxds = ux.open_dataset(grid_path, data_path)
data = uxds[variable].isel(time=0)
projection = ccrs.Orthographic(central_longitude=-90, central_latitude=41)
pc = data.to_polycollection(projection=projection)
Image

but when I subset the same data:

grid_path = '../Philadelphia_TC_grid_v2_ne128x8_pg2_SCRIP.nc'
data_path = '../Irene/15min_snap.INSTANT.nmins_x15.2011-08-28-54000.nc'
variable = 'LW_flux_up_at_model_top'
uxds = ux.open_dataset(grid_path, data_path)
lon_bounds = (-87.6298 - 30, -87.6298 + 30)
lat_bounds = (41.8781 - 30, 41.8781 + 30)
data = uxds[variable].isel(time=0).subset.bounding_box(
   lon_bounds,
   lat_bounds,
)
projection = ccrs.Orthographic(central_longitude=-90, central_latitude=41)
pc = data.to_polycollection(projection=projection)

things look OK.

Image

So my TL;DR is:

  • My plotting of a high-density mesh with polycollection does seem to work with 2025.10.0, but is very slow due to the removal of the projection during the generation of the mpl polycollection
  • Partially reverting to_polycollection to permit projections (and exclude periodic elements) speeds things back up.
  • However, when the polycollection set is large enough (perhaps includes periodic or degenerate elements?) something in the indexing breaks.

zarzycki avatar Oct 31 '25 13:10 zarzycki