dask-image
dask-image copied to clipboard
New Dask Arrow-based strings cause test failures
(Edited by @m-albert)
In the presence of pyarrow, dask by default assumes dataframes of type object to be pyarrow strings (see https://github.com/dask/dask/issues/10139#issuecomment-1655928619).
This creates problems revealed by failing tests (e.g. test_dask_image/test_ndmeasure/test_find_objects.py::test_3d_find_objects)
https://github.com/dask/dask-image/blob/67540af25597f84e4a642d644ba30dce7aebe753/dask_image/ndmeasure/_utils/_find_objects.py#L68-L70
dd.from_delayed(df1, meta=meta).compute().dtypes
Working install:
0 object 1 object 2 object dtype: object
Failing install:
0 string[pyarrow] 1 string[pyarrow] 2 string[pyarrow] dtype: object
The failing test had come up when releasing v2023.08.0 in https://github.com/conda-forge/dask-image-feedstock/pull/14.
@jakirkham found that pyarrow is installed with the conda distribution of dask, but not when installing over pip, where it just part of the [complete] target.
Also @jakirkham found that the above described conflicting behaviour can be turned off using the dask configuration.
He did this for the tests performed by the dask-image conda feedstock on v2023.08.0.