Speeding up performance
A few versions ago, xagg launched a numba backend that can be accessed via:
with xa.set_options(impl='numba':
agg = xa.aggregate(ds,wm)
This has drastically sped up calculations over the previous for loop and dot product backend, but still isn't ideal for very large aggregation tasks (say, 0.25-deg climate data and global ADM2 shapefiles).
Part of the problem lies in how xagg currently handles the indexing between regions and grid cells. Each region is stored as a row in a pandas dataframe, with a variable listing a numpy array of pixel indices. The numba implementation extracts these numpy arrays as a list, and works on each list item separately, but that still seems a bit overly complicated.
I wonder if there's a way to speed this up even more by moving the core computation to a ragged array library (like awkward array)?