CustomScheduler does not warn about early computes via compute_meta
Describe the bug
No warning from the CustomScheduler utility in satpy.utils.tests reaches the user/developer when the (accidental) computation occurs via the dask compute_meta function.
To Reproduce
import dask.config
import dask.array as da
import numpy as np
import xarray as xr
from satpy.tests.utils import CustomScheduler
cs = CustomScheduler(max_computes=0)
dabl = xr.DataArray(da.array([[True, True], [False, True]]), dims=("y", "x"))
dain = xr.DataArray(da.array([[0, 1], [2, 3]]), dims=("y", "x"))
with dask.config.set(scheduler=cs):
for i in range(5):
da.where(dabl, dain, np.nan)
print(cs.total_computes)
Expected behavior
I expect either that cs.total_computes equals zero, or that I'm told as soon as it becomes larger than max_computes.
Actual results
5
Environment Info:
- OS: Linux
- Satpy Version: main
Additional context
This isn't really satpys fault and I'm not sure what satpy could do about it. It does raise a RuntimeError, but this one is swallowed by dask. We can tell this by setting a breakpoint and inspecting the stack:
File "/data/gholl/checkouts/protocode/mwe/custom-scheduler-doesnt-catch.py", line 12, in <module>
da.where(dabl, dain, np.nan)
File "/data/gholl/mambaforge/envs/py311/lib/python3.11/site-packages/dask/array/routines.py", line 2107, in where
return elemwise(np.where, condition, x, y)
File "/data/gholl/mambaforge/envs/py311/lib/python3.11/site-packages/dask/array/core.py", line 4831, in elemwise
result = blockwise(
File "/data/gholl/mambaforge/envs/py311/lib/python3.11/site-packages/dask/array/blockwise.py", line 286, in blockwise
meta = compute_meta(func, dtype, *args[::2], **kwargs)
File "/data/gholl/mambaforge/envs/py311/lib/python3.11/site-packages/dask/array/utils.py", line 140, in compute_meta
meta = func(*args_meta, **kwargs_meta)
File "<__array_function__ internals>", line 200, in where
File "/data/gholl/mambaforge/envs/py311/lib/python3.11/site-packages/xarray/core/common.py", line 165, in __array__
return np.asarray(self.values, dtype=dtype)
File "/data/gholl/mambaforge/envs/py311/lib/python3.11/site-packages/xarray/core/dataarray.py", line 759, in values
return self.variable.values
File "/data/gholl/mambaforge/envs/py311/lib/python3.11/site-packages/xarray/core/variable.py", line 616, in values
return _as_array_or_item(self._data)
File "/data/gholl/mambaforge/envs/py311/lib/python3.11/site-packages/xarray/core/variable.py", line 309, in _as_array_or_item
data = np.asarray(data)
File "/data/gholl/mambaforge/envs/py311/lib/python3.11/site-packages/dask/array/core.py", line 1700, in __array__
x = self.compute()
File "/data/gholl/mambaforge/envs/py311/lib/python3.11/site-packages/dask/base.py", line 342, in compute
(result,) = compute(self, traverse=False, **kwargs)
File "/data/gholl/mambaforge/envs/py311/lib/python3.11/site-packages/dask/base.py", line 628, in compute
results = schedule(dsk, keys, **kwargs)
File "/data/gholl/checkouts/satpy/satpy/tests/utils.py", line 288, in __call__
raise RuntimeError("Too many dask computations were scheduled: "
however, in dask.array.utils.compute_meta we have:
try:
...
meta = func(*args_meta, **kwargs_meta)
...
except Exception:
return None
which is why the user never sees the RuntimeError raised in the CustomScheduler, and the early compute (such as in #2614) goes unnoticed.
I'm not sure what type of workaround could work to still inform the user.
Wow that except Exception is real bad in dask. I don't see a reasonable way for us to return any other normal exception and not get it swallowed up by that.
Dask issue: https://github.com/dask/dask/issues/10595
One workaround in tests seems to be to do an arr.compute() and use max_computes=1. In #2623, before the fix, the scheduler reported four computations when triggered. So cause one intended computation to reveal the hidden ones. Maybe.