iris
iris copied to clipboard
[bug from Hell] Serious issue with: concatenated cubes with masks and `dask=2024.8.0`
Hi folks,
- MRE/MRC code:
import iris
import numpy as np
c1 = iris.load_cube("cubb-1.nc")
c2 = iris.load_cube("cubb-2.nc")
# apply slice to concatenated cube
slicer = (
np.random.choice(a=[False, True], size=(730,)),
slice(None, None, None),
slice(None, None, None),
slice(None, None, None)
)
# can use this to slice each of cubes c1 and/or c2
slicer1 = (
np.random.choice(a=[False, True], size=(365,)),
slice(None, None, None),
slice(None, None, None),
slice(None, None, None)
)
cube = iris.cube.CubeList([c1, c2]).concatenate_cube()
cubes = cube[slicer]
print("After slicing", cubes.data)
- environment:
iris 3.9.0 pyha770c72_0 conda-forge
numpy 1.26.4 py312heda63a1_0 conda-forge
dask 2024.8.0 pyhd8ed1ab_0 conda-forge
dask-core 2024.8.0 pyhd8ed1ab_0 conda-forge
or
dask 2024.7.1 pyhd8ed1ab_0 conda-forge
dask-core 2024.7.1 pyhd8ed1ab_0 conda-forge
- problem: use the MRE code to reproduce this:
- you have two cubes, one or both with masked data
- concatenate them
- apply a (time-like) slice
- resulting cube will randomly (depending on slice) have fill values explicitly used as numerical data values (ie not as masked elements) when using
dask==2024.8.0, so one gets cray values of 1.e+36 etc - this behaviour never happens with previous
dask==2024.7.1 - example above uses fairly hefty cubes (attached here for testing, just change the extension), but am sure this can be scaled-down to smaller cubes, with the same behaviour
Good luck fixing this folks, it took me two days to isolate it from ESMValCore, am sure it's not a very straightforward fix :grin: But it's an ugly bug that can bite badly! cubb-1.nc.txt cubb-2.nc.txt
https://github.com/SciTools/iris/labels/Status%3A%20Decision%20Required
Historically when we have raised mask-related issues with Dask, they have asked us to propose the fix. Historically we have found that you need to be highly experienced in Dask before you can work on it. None of us are highly experienced in Dask.
Options:
- Record this operation as impossible in our documentation.
- Remove some proposed features from Iris 3.11 so that one/two of us have time to work on fixing this (perhaps involving a Dask PR?)
- Recognise that neither Dask nor NumPy really care about masks and come up with alternative way of providing this functionality that we have control over. E.g. flatten array, filter out 'masked' points, perform whatever operation, ravel array back to the correct shape?
@trexfeathers very many thanks for looking into this! :beer:
I know you guys' masked pain - masked arrays are a lot more important than what Numpy/Dask folk consider them to be :grin: Let me know if I can help with testing the fix etc
I believe I have narrowed this down: https://github.com/dask/dask/issues/11296
@rcomer that's the one indeed, excellent detective work!
@valeriupredoi there is now a fix on dask main branch. Are you able to test your case(s) with that?
@rcomer let me try test with that, cheers :beer:
@rcomer dask 2024.8.0+13.g67b2852 installed from source does the trick perfectly - excellent work on this :beer:
From @scitools/peloton: cross-chunk slicing has come up before. Before this can be closed we should write a test to catch cases like this.
This is one of the ESMValTool tests that failed:
https://github.com/ESMValGroup/ESMValCore/blob/f969e82796f5f3e47c97169b635ef6bb8b8a5eb1/tests/sample_data/multimodel_statistics/test_multimodel.py#L219-L226
From @SciTools/peloton we can maybe add a negative pin for the problem.
This would go in 3.11
From @SciTools/peloton we can maybe add a negative pin for the problem. This would go in 3.11
a very positive take on it :grin: (sorry, I was on holidays until a few days ago)