iris icon indicating copy to clipboard operation
iris copied to clipboard

[bug from Hell] Serious issue with: concatenated cubes with masks and `dask=2024.8.0`

Open valeriupredoi opened this issue 1 year ago • 11 comments

Hi folks,

  • MRE/MRC code:
import iris
import numpy as np


c1 = iris.load_cube("cubb-1.nc")
c2 = iris.load_cube("cubb-2.nc")

# apply slice to concatenated cube
slicer = (
    np.random.choice(a=[False, True], size=(730,)),
    slice(None, None, None),
    slice(None, None, None),
    slice(None, None, None)
)

# can use this to slice each of cubes c1 and/or c2
slicer1 = (
    np.random.choice(a=[False, True], size=(365,)),
    slice(None, None, None),
    slice(None, None, None),
    slice(None, None, None)
)

cube = iris.cube.CubeList([c1, c2]).concatenate_cube()
cubes = cube[slicer]
print("After slicing", cubes.data)
  • environment:
iris                      3.9.0              pyha770c72_0    conda-forge
numpy                     1.26.4          py312heda63a1_0    conda-forge

dask                      2024.8.0           pyhd8ed1ab_0    conda-forge
dask-core                 2024.8.0           pyhd8ed1ab_0    conda-forge

or

dask                      2024.7.1           pyhd8ed1ab_0    conda-forge
dask-core                 2024.7.1           pyhd8ed1ab_0    conda-forge
  • problem: use the MRE code to reproduce this:
    • you have two cubes, one or both with masked data
    • concatenate them
    • apply a (time-like) slice
    • resulting cube will randomly (depending on slice) have fill values explicitly used as numerical data values (ie not as masked elements) when using dask==2024.8.0, so one gets cray values of 1.e+36 etc
    • this behaviour never happens with previous dask==2024.7.1
    • example above uses fairly hefty cubes (attached here for testing, just change the extension), but am sure this can be scaled-down to smaller cubes, with the same behaviour

Good luck fixing this folks, it took me two days to isolate it from ESMValCore, am sure it's not a very straightforward fix :grin: But it's an ugly bug that can bite badly! cubb-1.nc.txt cubb-2.nc.txt

valeriupredoi avatar Aug 08 '24 15:08 valeriupredoi

https://github.com/SciTools/iris/labels/Status%3A%20Decision%20Required

Historically when we have raised mask-related issues with Dask, they have asked us to propose the fix. Historically we have found that you need to be highly experienced in Dask before you can work on it. None of us are highly experienced in Dask.

Options:

  • Record this operation as impossible in our documentation.
  • Remove some proposed features from Iris 3.11 so that one/two of us have time to work on fixing this (perhaps involving a Dask PR?)
  • Recognise that neither Dask nor NumPy really care about masks and come up with alternative way of providing this functionality that we have control over. E.g. flatten array, filter out 'masked' points, perform whatever operation, ravel array back to the correct shape?

trexfeathers avatar Aug 09 '24 08:08 trexfeathers

@trexfeathers very many thanks for looking into this! :beer:

I know you guys' masked pain - masked arrays are a lot more important than what Numpy/Dask folk consider them to be :grin: Let me know if I can help with testing the fix etc

valeriupredoi avatar Aug 09 '24 12:08 valeriupredoi

I believe I have narrowed this down: https://github.com/dask/dask/issues/11296

rcomer avatar Aug 10 '24 11:08 rcomer

@rcomer that's the one indeed, excellent detective work!

valeriupredoi avatar Aug 12 '24 13:08 valeriupredoi

@valeriupredoi there is now a fix on dask main branch. Are you able to test your case(s) with that?

rcomer avatar Aug 12 '24 17:08 rcomer

@rcomer let me try test with that, cheers :beer:

valeriupredoi avatar Aug 13 '24 12:08 valeriupredoi

@rcomer dask 2024.8.0+13.g67b2852 installed from source does the trick perfectly - excellent work on this :beer:

valeriupredoi avatar Aug 13 '24 12:08 valeriupredoi

From @scitools/peloton: cross-chunk slicing has come up before. Before this can be closed we should write a test to catch cases like this.

trexfeathers avatar Aug 14 '24 09:08 trexfeathers

This is one of the ESMValTool tests that failed:

https://github.com/ESMValGroup/ESMValCore/blob/f969e82796f5f3e47c97169b635ef6bb8b8a5eb1/tests/sample_data/multimodel_statistics/test_multimodel.py#L219-L226

schlunma avatar Aug 14 '24 09:08 schlunma

From @SciTools/peloton we can maybe add a negative pin for the problem.
This would go in 3.11

pp-mo avatar Aug 28 '24 09:08 pp-mo

From @SciTools/peloton we can maybe add a negative pin for the problem. This would go in 3.11

a very positive take on it :grin: (sorry, I was on holidays until a few days ago)

valeriupredoi avatar Sep 19 '24 14:09 valeriupredoi