dask-image icon indicating copy to clipboard operation
dask-image copied to clipboard

New Dask Arrow-based strings cause test failures

Open jakirkham opened this issue 2 years ago • 14 comments

(Edited by @m-albert)

In the presence of pyarrow, dask by default assumes dataframes of type object to be pyarrow strings (see https://github.com/dask/dask/issues/10139#issuecomment-1655928619).

This creates problems revealed by failing tests (e.g. test_dask_image/test_ndmeasure/test_find_objects.py::test_3d_find_objects)

https://github.com/dask/dask-image/blob/67540af25597f84e4a642d644ba30dce7aebe753/dask_image/ndmeasure/_utils/_find_objects.py#L68-L70

dd.from_delayed(df1, meta=meta).compute().dtypes

Working install:

0 object 1 object 2 object dtype: object

Failing install:

0 string[pyarrow] 1 string[pyarrow] 2 string[pyarrow] dtype: object

The failing test had come up when releasing v2023.08.0 in https://github.com/conda-forge/dask-image-feedstock/pull/14.

@jakirkham found that pyarrow is installed with the conda distribution of dask, but not when installing over pip, where it just part of the [complete] target.

Also @jakirkham found that the above described conflicting behaviour can be turned off using the dask configuration.

He did this for the tests performed by the dask-image conda feedstock on v2023.08.0.

jakirkham avatar Aug 03 '23 21:08 jakirkham