iris icon indicating copy to clipboard operation
iris copied to clipboard

Use `meta` correctly in `map_blocks` to prevent dask from passing a 0d array through the regridder

Open wjbenfold opened this issue 3 years ago • 3 comments

Context

dask.array.map_blocks will under some circumstances pass a 0d array through the provided function when initialised, as documented in https://docs.dask.org/en/latest/generated/dask.array.map_blocks.html In https://github.com/SciTools/iris/blob/4abaa8f5d4be918b2e0b7bd42fbcc1e0a0196dd6/lib/iris/_lazy_data.py#L355-L388 we call map_blocks without meta, including when handing it an area weighted regridding function (and presumably other times) that won't pass through a 0d array. We do the same elsewhere in the same file too.

Issues arising

  • #4574 documents a deprecation warning seen when the Iris tests are run as the 0d array is passed in by dask and then indexed.
  • I don't know if we get performance or safety improvements by adding this in, it's more that we're "doing it properly" / using dask as designed. Seems like a good way to make it easier to understand the codebase though.

Suggestions

  • Work out how to choose what the meta kwarg should be set to, and set it
  • Consider whether the dtype argument should also be provided

wjbenfold avatar Feb 23 '22 12:02 wjbenfold

Turns out I was wrong the other day when I told @trexfeathers that all this needed was working out, and that it didn't need deciding first...

Investigation

map_complete_blocks is called twice within Iris:

_area_weighted

https://github.com/SciTools/iris/blob/2fa7c598d4bcbe7e861b12b28a874360af974421/lib/iris/analysis/_area_weighted.py#L1109-L1111

The return dtype is set within area weighted regridding using np.promote_types(src_data.dtype, np.float16). We could:

  • have _regrid_area_weighted_rectilinear_src_and_grid__perform calculate the dtype it wants and pass this as an argument to _regrid_area_weighted_array.
  • tightly couple _regrid_area_weighted_rectilinear_src_and_grid__perform and _regrid_area_weighted_array by computing the promotion twice, and then passing it as a kwarg to map_complete_blocks.
  • have _regrid_area_weighted_array catch a 0d input array as obviously from dask and return the correct type of array to satisfy it.

_regrid

https://github.com/SciTools/iris/blob/9268ca9c70bd160c35c5099025d632ae33faf858/lib/iris/analysis/_regrid.py#L1087-L1089

The return dtype of _regrid is found through a similar promotion method, though with more caveats, and looks like it can change depending on what the _RegularGridInterpolator does. This gives us similar options:

  • have RectilinearRegridder.__call__ calculate the dtype it wants and pass this as an argument to _regrid.
  • couple RectilinearRegridder.__call__ and _regrid by computing the dtype twice.
  • have _regrid catch a 0d input array as obviously from dask and return the correct type of array to satisfy it.

N.B. Where I've referred to "dtype" above, we'd also have to think about the type of the array (whether it's masked etc.) too, but that's easier as I think it's just whatever the source data was.

Question

We should make the same choice in both places, and I don't think the coupling two functions choice is a good one so do we:

  1. Pre-choose the dtype in the functions that call map_complete_blocks and have them pass a meta argument to map_complete_blocks (which can then pass it to map_blocks?
  2. Let map_blocks throw a 0d array through the regridders, and spot it in there then return a 0d array of the right type?

wjbenfold avatar Mar 04 '22 14:03 wjbenfold

In order to maintain a backlog of relevant issues, we automatically label them as stale after 500 days of inactivity.

If this issue is still important to you, then please comment on this issue and the stale label will be removed.

Otherwise this issue will be automatically closed in 28 days time.

github-actions[bot] avatar Nov 04 '23 00:11 github-actions[bot]

This stale issue has been automatically closed due to a lack of community activity.

If you still care about this issue, then please either:

  • Re-open this issue, if you have sufficient permissions, or
  • Add a comment stating that this is still relevant and someone will re-open it on your behalf.

github-actions[bot] avatar Dec 02 '23 00:12 github-actions[bot]