Speeding up performance

Open ks905383 opened this issue 8 months ago • 0 comments

A few versions ago, xagg launched a numba backend that can be accessed via:

with xa.set_options(impl='numba':
   agg = xa.aggregate(ds,wm)

This has drastically sped up calculations over the previous for loop and dot product backend, but still isn't ideal for very large aggregation tasks (say, 0.25-deg climate data and global ADM2 shapefiles).

Part of the problem lies in how xagg currently handles the indexing between regions and grid cells. Each region is stored as a row in a pandas dataframe, with a variable listing a numpy array of pixel indices. The numba implementation extracts these numpy arrays as a list, and works on each list item separately, but that still seems a bit overly complicated.

I wonder if there's a way to speed this up even more by moving the core computation to a ragged array library (like awkward array)?

Aug 11 '25 15:08 ks905383