iris
iris copied to clipboard
Use `meta` correctly in `map_blocks` to prevent dask from passing a 0d array through the regridder
Context
dask.array.map_blocks will under some circumstances pass a 0d array through the provided function when initialised, as documented in https://docs.dask.org/en/latest/generated/dask.array.map_blocks.html
In https://github.com/SciTools/iris/blob/4abaa8f5d4be918b2e0b7bd42fbcc1e0a0196dd6/lib/iris/_lazy_data.py#L355-L388 we call map_blocks without meta, including when handing it an area weighted regridding function (and presumably other times) that won't pass through a 0d array. We do the same elsewhere in the same file too.
Issues arising
- #4574 documents a deprecation warning seen when the Iris tests are run as the 0d array is passed in by dask and then indexed.
- I don't know if we get performance or safety improvements by adding this in, it's more that we're "doing it properly" / using dask as designed. Seems like a good way to make it easier to understand the codebase though.
Suggestions
- Work out how to choose what the
metakwarg should be set to, and set it - Consider whether the dtype argument should also be provided
Turns out I was wrong the other day when I told @trexfeathers that all this needed was working out, and that it didn't need deciding first...
Investigation
map_complete_blocks is called twice within Iris:
_area_weighted
https://github.com/SciTools/iris/blob/2fa7c598d4bcbe7e861b12b28a874360af974421/lib/iris/analysis/_area_weighted.py#L1109-L1111
The return dtype is set within area weighted regridding using np.promote_types(src_data.dtype, np.float16). We could:
- have
_regrid_area_weighted_rectilinear_src_and_grid__performcalculate the dtype it wants and pass this as an argument to_regrid_area_weighted_array. - tightly couple
_regrid_area_weighted_rectilinear_src_and_grid__performand_regrid_area_weighted_arrayby computing the promotion twice, and then passing it as a kwarg tomap_complete_blocks. - have
_regrid_area_weighted_arraycatch a 0d input array as obviously from dask and return the correct type of array to satisfy it.
_regrid
https://github.com/SciTools/iris/blob/9268ca9c70bd160c35c5099025d632ae33faf858/lib/iris/analysis/_regrid.py#L1087-L1089
The return dtype of _regrid is found through a similar promotion method, though with more caveats, and looks like it can change depending on what the _RegularGridInterpolator does. This gives us similar options:
- have
RectilinearRegridder.__call__calculate the dtype it wants and pass this as an argument to_regrid. - couple
RectilinearRegridder.__call__and_regridby computing the dtype twice. - have
_regridcatch a 0d input array as obviously from dask and return the correct type of array to satisfy it.
N.B. Where I've referred to "dtype" above, we'd also have to think about the type of the array (whether it's masked etc.) too, but that's easier as I think it's just whatever the source data was.
Question
We should make the same choice in both places, and I don't think the coupling two functions choice is a good one so do we:
- Pre-choose the dtype in the functions that call
map_complete_blocksand have them pass ametaargument tomap_complete_blocks(which can then pass it tomap_blocks? - Let map_blocks throw a 0d array through the regridders, and spot it in there then return a 0d array of the right type?
In order to maintain a backlog of relevant issues, we automatically label them as stale after 500 days of inactivity.
If this issue is still important to you, then please comment on this issue and the stale label will be removed.
Otherwise this issue will be automatically closed in 28 days time.
This stale issue has been automatically closed due to a lack of community activity.
If you still care about this issue, then please either:
- Re-open this issue, if you have sufficient permissions, or
- Add a comment stating that this is still relevant and someone will re-open it on your behalf.