dask-image icon indicating copy to clipboard operation
dask-image copied to clipboard

For some schedulers, setting PIMS image reader's `.class_priority` is ineffective in controlling `dask-image.imread()`

Open ParticularMiner opened this issue 3 years ago • 15 comments

cc: @jmdelahanty

Hi dask-image developers!

Normally an end-user may control which reader pims.open() uses to load images by simply increasing the .class_priority attribute of their preferred pims reader prior to calling pims.open(). See this link.

pims.ImageIOReader.class_priority = 100  # we set this very high in order to force pims.open() to use this reader
rgb_frames = pims.open('/path/to/video/file.mpg')  # uses ImageIOReader

Since dask-image.imread() uses pims.open(), it would be great if it could mirror such functionality too.

pims.ImageIOReader.class_priority = 100  # we set this very high in order to force dask's imread() to use this reader [via pims.open()]
rgb_frames = dask_image.imread.imread('/path/to/video/file.mpg')  # uses ImageIOReader

And indeed this functionality does work for dask-image.imread() in single-machine schedulers, like "threading" and "sync". But I do not know of a way to make all processes, in a multi-process scheduler, for example, aware of the preferred reader's increased .class_priority. Any help here would be greatly appreciated.

Alternatively, it might be an idea to modify dask-image.imread() to receive a "reader" keyword argument which indicates the end-user's preferred PIMS reader.

ParticularMiner avatar Apr 30 '22 15:04 ParticularMiner

Hi @ParticularMiner

I see how that would be useful. Have you tried using the dask.array.image.imread function (regular dask, not the one in dask-image)? It allows you to pass in your preferred reader function directly, which seems even easier than fiddling with the priority levels.

from dask.array.image import imread
import pims

data = imread('path/to/files/*.tif', imread=pims.ImageIOReader)

(Having two different imread functions in two different places kinda violates the python zen "There should be one-- and preferably only one --obvious way to do it", which I don't like. You can read some more discussion about that here if you like: https://github.com/dask/dask-image/issues/229)

GenevieveBuckley avatar May 10 '22 06:05 GenevieveBuckley

Let us know if that fixes your issue (and also feel free to let us know if you have opinions about https://github.com/dask/dask-image/issues/229. Development is stalled now I'm no longer working full time on dask stuff, but it's still good to hear from people)

GenevieveBuckley avatar May 10 '22 07:05 GenevieveBuckley

Hi @GenevieveBuckley ,

Thank you! Until now, I had been unaware of dask.array.image.imread().

The API of dask.array.image.imread() is certainly attractive, in that it allows the use of other readers. But it would be great if it also had some of the other keyword arguments of dask_image.imread.imread(). But I agree, that dask should have only one such function. And perhaps since dask-image presumably deals with all things image, then it would make sense for dask.array.image.imread() to be moved into dask_image.imread.imread().

Unfortunately though, as it is now, dask.array.image.imread() raised an exception while reading a video file which dask_image.imread.imread() had no problem reading:

from dask.array.image import imread
import pims


video = imread('path/to/video.mp4', imread=pims.ImageIOReader)
Click to see error messages:
dask\array\image.py:58: in imread
    keys = [(name, i) + (0,) * len(sample.shape) for i in range(len(filenames))]
        filename   = 'path/to/video.mp4'
        filenames  = ['path/to/video.mp4']
        imread     = <class 'pims.imageio_reader.ImageIOReader'>
        name       = 'imread-baa7a8184312ac7b15459beea41cbd90'
        preprocess = None
        sample     = <FramesSequenceND>
Axes: 3
Axis 'x' size: 1920
Axis 'y' size: 1080
Axis 't' size: 851
Pixel Datatype: uint8
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

.0 = <range_iterator object at 0x000002C0CE3F6730>

>   keys = [(name, i) + (0,) * len(sample.shape) for i in range(len(filenames))]
E   AttributeError: 'ImageIOReader' object has no attribute 'shape'

.0         = <range_iterator object at 0x000002C0CE3F6730>
i          = 0
name       = 'imread-baa7a8184312ac7b15459beea41cbd90'
sample     = <FramesSequenceND>
Axes: 3
Axis 'x' size: 1920
Axis 'y' size: 1080
Axis 't' size: 851
Pixel Datatype: uint8

dask\array\image.py:58: AttributeError

ParticularMiner avatar May 10 '22 17:05 ParticularMiner

  1. Are you able to share a small example video file? I've tried using some of the demo video files available here, but wasn't able to reproduce the error you show above.

  2. Can you share the output from conda list and/or pip list? Knowing which versions you have for the different python libraries would be helpful.

GenevieveBuckley avatar May 11 '22 04:05 GenevieveBuckley

Sure.

  • Download video file: test_vid.mp4. I was also unable to open the demo video files at the link you provided.

  • conda list

    # packages in environment at /path/to/conda/environment: # # Name Version Build Channel aiohttp 3.8.1 py39hb82d6ee_0 conda-forge aiosignal 1.2.0 pyhd8ed1ab_0 conda-forge anyio 3.5.0 py39hcbf5309_0 conda-forge aom 3.3.0 h0e60522_1 conda-forge apptools 5.1.0 pyh44b312d_0 conda-forge argon2-cffi 21.3.0 pyhd8ed1ab_0 conda-forge argon2-cffi-bindings 21.2.0 py39hb82d6ee_1 conda-forge asciitree 0.3.3 py_2 conda-forge astroid 2.8.6 py39hcbf5309_1 conda-forge asttokens 2.0.5 pyhd8ed1ab_0 conda-forge async-timeout 4.0.2 pyhd8ed1ab_0 conda-forge atomicwrites 1.4.0 pyh9f0ad1d_0 conda-forge attrs 21.4.0 pyhd8ed1ab_0 conda-forge babel 2.9.1 pyh44b312d_0 conda-forge backcall 0.2.0 pyh9f0ad1d_0 conda-forge backports 1.0 py_2 conda-forge backports.functools_lru_cache 1.6.4 pyhd8ed1ab_0 conda-forge beautifulsoup4 4.10.0 pyha770c72_0 conda-forge black 21.12b0 pyhd8ed1ab_0 conda-forge bleach 4.1.0 pyhd8ed1ab_0 conda-forge blosc 1.21.0 h0e60522_0 conda-forge bokeh 2.4.2 py39hcbf5309_0 conda-forge brotlipy 0.7.0 py39hb82d6ee_1003 conda-forge bzip2 1.0.8 h8ffe710_4 conda-forge c-blosc2 2.0.4 h09319c2_1 conda-forge ca-certificates 2021.10.8 h5b45459_0 conda-forge cairo 1.16.0 hb19e0ff_1008 conda-forge certifi 2021.10.8 py39hcbf5309_2 conda-forge cffi 1.15.0 py39h0878f49_0 conda-forge cfgv 3.3.1 pyhd8ed1ab_0 conda-forge cfitsio 4.1.0 h5a969a9_0 conda-forge chardet 4.0.0 py39hcbf5309_2 conda-forge charls 2.3.4 h39d44d4_0 conda-forge charset-normalizer 2.0.7 pyhd8ed1ab_0 conda-forge click 8.0.4 py39hcbf5309_0 conda-forge cloudpickle 2.0.0 pyhd8ed1ab_0 conda-forge colorama 0.4.4 pyh9f0ad1d_0 conda-forge conda 4.11.0 py39hcbf5309_0 conda-forge conda-build 3.21.7 py39hcbf5309_0 conda-forge conda-package-handling 1.8.0 py39hb3671d1_0 conda-forge configobj 5.0.6 py_0 conda-forge coverage 6.3.2 py39hb82d6ee_1 conda-forge cryptography 36.0.1 py39h7bc7c5c_0 conda-forge curl 7.82.0 h789b8ee_0 conda-forge cycler 0.11.0 pyhd8ed1ab_0 conda-forge cytoolz 0.11.2 py39hb82d6ee_1 conda-forge dask 2022.3.0+8.gad98d4ac.dirty dev_0 dask-core 2022.2.1 pyhd3eb1b0_0 dask-image 2021.12.0 pyhd8ed1ab_0 conda-forge dataclasses 0.8 pyhc8e2a94_3 conda-forge debugpy 1.5.1 py39h415ef7b_0 conda-forge decorator 5.1.1 pyhd8ed1ab_0 conda-forge defusedxml 0.7.1 pyhd8ed1ab_0 conda-forge distlib 0.3.4 pyhd8ed1ab_0 conda-forge distributed 2022.2.1 pyhd8ed1ab_0 conda-forge donfig 0.6.0 pyhd8ed1ab_0 conda-forge double-conversion 3.2.0 h0e60522_0 conda-forge eigen 3.4.0 h2d74725_0 conda-forge entrypoints 0.4 pyhd8ed1ab_0 conda-forge envisage 6.0.1 pyhd8ed1ab_0 conda-forge executing 0.8.3 pyhd8ed1ab_0 conda-forge expat 2.4.7 h39d44d4_0 conda-forge fasteners 0.17.3 pyhd8ed1ab_0 conda-forge ffmpeg 4.3.1 ha925a31_0 conda-forge filelock 3.6.0 pyhd8ed1ab_0 conda-forge flake8 3.9.2 pyhd8ed1ab_0 conda-forge flit-core 3.7.1 pyhd8ed1ab_0 conda-forge font-ttf-dejavu-sans-mono 2.37 hab24e00_0 conda-forge font-ttf-inconsolata 3.000 h77eed37_0 conda-forge font-ttf-source-code-pro 2.038 h77eed37_0 conda-forge font-ttf-ubuntu 0.83 hab24e00_0 conda-forge fontconfig 2.13.96 hce3cb01_2 conda-forge fonts-conda-ecosystem 1 0 conda-forge fonts-conda-forge 1 0 conda-forge freetype 2.10.4 h546665d_1 conda-forge fribidi 1.0.10 h8d14728_0 conda-forge frozenlist 1.3.0 py39hb82d6ee_0 conda-forge fsspec 2022.2.0 pyhd8ed1ab_0 conda-forge getopt-win32 0.1 h8ffe710_0 conda-forge gettext 0.19.8.1 ha2e2712_1008 conda-forge giflib 5.2.1 h8d14728_2 conda-forge gl2ps 1.4.2 h0597ee9_0 conda-forge glew 2.1.0 h39d44d4_2 conda-forge glob2 0.7 py_0 conda-forge graphblas 5.1.10 h0e60522_0 conda-forge graphite2 1.3.13 1000 conda-forge graphviz 2.50.0 hefbd956_1 conda-forge gts 0.7.6 h7c369d9_2 conda-forge harfbuzz 3.1.1 hc601d6f_0 conda-forge hdf4 4.2.15 h0e5069d_3 conda-forge hdf5 1.12.1 nompi_h2a0e4a3_104 conda-forge heapdict 1.0.1 py_0 conda-forge icu 68.2 h0e60522_0 conda-forge identify 2.4.12 pyhd8ed1ab_0 conda-forge idna 3.3 pyhd8ed1ab_0 conda-forge imagecodecs 2022.2.22 py39h279a0da_3 conda-forge imageio 2.16.2 pyhcf75d05_0 conda-forge imageio-ffmpeg 0.4.5 pyhd8ed1ab_0 conda-forge importlib-metadata 4.11.3 py39hcbf5309_0 conda-forge importlib_metadata 4.11.3 hd8ed1ab_1 conda-forge importlib_resources 5.4.0 pyhd8ed1ab_0 conda-forge iniconfig 1.1.1 pyh9f0ad1d_0 conda-forge intel-openmp 2022.0.0 h57928b3_3663 conda-forge ipykernel 6.9.2 py39h832f523_0 conda-forge ipympl 0.8.8 pyhd8ed1ab_0 conda-forge ipython 8.1.1 py39hcbf5309_0 conda-forge ipython_genutils 0.2.0 py_1 conda-forge ipywidgets 7.7.0 pyhd8ed1ab_0 conda-forge isort 5.10.1 pyhd8ed1ab_0 conda-forge jbig 2.1 h8d14728_2003 conda-forge jedi 0.18.1 py39hcbf5309_0 conda-forge jinja2 3.0.3 pyhd8ed1ab_0 conda-forge joblib 1.1.0 pyhd8ed1ab_0 conda-forge jpeg 9e h8ffe710_0 conda-forge json5 0.9.5 pyh9f0ad1d_0 conda-forge jsoncpp 1.9.5 h2d74725_1 conda-forge jsonschema 4.4.0 pyhd8ed1ab_0 conda-forge jupyter_client 7.1.2 pyhd8ed1ab_0 conda-forge jupyter_core 4.9.2 py39hcbf5309_0 conda-forge jupyter_server 1.13.5 pyhd8ed1ab_0 conda-forge jupyterlab 3.2.4 pyhd8ed1ab_0 conda-forge jupyterlab_pygments 0.1.2 pyh9f0ad1d_0 conda-forge jupyterlab_server 2.11.0 pyhd8ed1ab_0 conda-forge jupyterlab_widgets 1.1.0 pyhd8ed1ab_0 conda-forge jxrlib 1.1 h8ffe710_2 conda-forge kaleido 0.2.1 pypi_0 pypi kiwisolver 1.4.0 py39h2e07f2f_0 conda-forge krb5 1.19.3 h1176d77_0 conda-forge lazy-object-proxy 1.7.1 py39hb82d6ee_0 conda-forge lcms2 2.12 h2a16943_0 conda-forge lerc 3.0 h0e60522_0 conda-forge libaec 1.0.6 h39d44d4_0 conda-forge libarchive 3.5.2 hb45042f_1 conda-forge libavif 0.10.0 h8ffe710_1 conda-forge libblas 3.9.0 13_win64_mkl conda-forge libbrotlicommon 1.0.9 h8ffe710_7 conda-forge libbrotlidec 1.0.9 h8ffe710_7 conda-forge libbrotlienc 1.0.9 h8ffe710_7 conda-forge libcblas 3.9.0 13_win64_mkl conda-forge libclang 11.1.0 default_h5c34c98_1 conda-forge libcurl 7.82.0 h789b8ee_0 conda-forge libdeflate 1.10 h8ffe710_0 conda-forge libffi 3.4.2 h8ffe710_5 conda-forge libflang 5.0.0 h6538335_20180525 conda-forge libgd 2.3.3 h8bb91b0_0 conda-forge libglib 2.70.2 h3be07f2_4 conda-forge libiconv 1.16 he774522_0 conda-forge liblapack 3.9.0 13_win64_mkl conda-forge liblief 0.11.5 h0e60522_1 conda-forge libnetcdf 4.8.1 nompi_h1cc8e9d_101 conda-forge libogg 1.3.4 h8ffe710_1 conda-forge libpng 1.6.37 h1d00b33_2 conda-forge libsodium 1.0.18 h8d14728_1 conda-forge libssh2 1.10.0 h680486a_2 conda-forge libtheora 1.1.1 h8d14728_1005 conda-forge libtiff 4.3.0 hc4061b1_3 conda-forge libwebp 1.2.2 h57928b3_0 conda-forge libwebp-base 1.2.2 h8ffe710_1 conda-forge libxcb 1.13 hcd874cb_1004 conda-forge libxml2 2.9.12 hf5bbc77_1 conda-forge libzip 1.8.0 hfed4ece_1 conda-forge libzlib 1.2.11 h8ffe710_1013 conda-forge libzopfli 1.0.3 h0e60522_0 conda-forge llvm-meta 5.0.0 0 conda-forge llvmlite 0.37.0 py39ha0cd8c8_0 conda-forge locket 0.2.0 py_2 conda-forge loguru 0.6.0 py39hcbf5309_1 conda-forge lz4-c 1.9.3 h8ffe710_1 conda-forge lzo 2.10 he774522_1000 conda-forge m2-msys2-runtime 2.5.0.17080.65c939c 3 conda-forge m2-patch 2.7.5 2 conda-forge m2w64-gcc-libgfortran 5.3.0 6 conda-forge m2w64-gcc-libs 5.3.0 7 conda-forge m2w64-gcc-libs-core 5.3.0 7 conda-forge m2w64-gmp 6.1.0 2 conda-forge m2w64-libwinpthread-git 5.0.0.4634.697f757 2 conda-forge markupsafe 2.1.1 py39hb82d6ee_0 conda-forge matplotlib 3.4.3 py39hcbf5309_1 conda-forge matplotlib-base 3.4.3 py39h581301d_2 conda-forge matplotlib-inline 0.1.3 pyhd8ed1ab_0 conda-forge mayavi 4.7.4 py39h4a0cae3_1 conda-forge mccabe 0.6.1 py_1 conda-forge menuinst 1.4.18 py39hcbf5309_1 conda-forge mistune 0.8.4 py39hb82d6ee_1005 conda-forge mkl 2022.0.0 h0e2418a_796 conda-forge more-itertools 8.12.0 pyhd8ed1ab_0 conda-forge msgpack-python 1.0.3 py39h2e07f2f_0 conda-forge msys2-conda-epoch 20160418 1 conda-forge multidict 6.0.2 py39hb82d6ee_0 conda-forge mypy_extensions 0.4.3 py39hcbf5309_4 conda-forge nbclassic 0.3.7 pyhd8ed1ab_0 conda-forge nbclient 0.5.13 pyhd8ed1ab_0 conda-forge nbconvert 6.4.4 py39hcbf5309_0 conda-forge nbformat 5.2.0 pyhd8ed1ab_0 conda-forge nest-asyncio 1.5.4 pyhd8ed1ab_0 conda-forge networkx 2.8 pyhd8ed1ab_0 conda-forge nodeenv 1.6.0 pyhd8ed1ab_0 conda-forge notebook 6.4.10 pyha770c72_0 conda-forge notebook-shim 0.1.0 pyhd8ed1ab_0 conda-forge numba 0.54.1 py39hb8cd55e_0 conda-forge numcodecs 0.9.1 py39h415ef7b_2 conda-forge numpy 1.20.3 py39h6635163_1 conda-forge openjpeg 2.4.0 hb211442_1 conda-forge openmp 5.0.0 vc14_1 conda-forge openssl 1.1.1n h8ffe710_0 conda-forge packaging 21.3 pyhd8ed1ab_0 conda-forge pandas 1.3.4 py39h2e25243_0 conda-forge pandoc 2.17.1.1 h57928b3_0 conda-forge pandocfilters 1.5.0 pyhd8ed1ab_0 conda-forge pango 1.48.10 h33e4779_2 conda-forge parso 0.8.3 pyhd8ed1ab_0 conda-forge partd 1.2.0 pyhd8ed1ab_0 conda-forge pathspec 0.9.0 pyhd8ed1ab_0 conda-forge pcre 8.45 h0e60522_0 conda-forge pickleshare 0.7.5 py_1003 conda-forge pillow 9.0.1 py39ha53f419_2 conda-forge pims 0.5 pyh9f0ad1d_1 conda-forge pip 22.0.4 pyhd8ed1ab_0 conda-forge pixman 0.40.0 h8ffe710_0 conda-forge pkginfo 1.8.2 pyhd8ed1ab_0 conda-forge platformdirs 2.5.1 pyhd8ed1ab_0 conda-forge plotly 5.4.0 pyhd8ed1ab_0 conda-forge pluggy 1.0.0 py39hcbf5309_2 conda-forge pre-commit 2.15.0 py39hcbf5309_1 conda-forge proj 9.0.0 h1cfcee9_1 conda-forge prometheus_client 0.13.1 pyhd8ed1ab_0 conda-forge prompt-toolkit 3.0.27 pyha770c72_0 conda-forge psutil 5.9.0 py39hb82d6ee_0 conda-forge pthread-stubs 0.4 hcd874cb_1001 conda-forge pugixml 1.11.4 h0e60522_0 conda-forge pure_eval 0.2.2 pyhd8ed1ab_0 conda-forge py 1.11.0 pyh6c4a22f_0 conda-forge py-lief 0.11.5 py39h415ef7b_1 conda-forge pycodestyle 2.7.0 pyhd3eb1b0_0 pycosat 0.6.3 py39hb82d6ee_1009 conda-forge pycparser 2.21 pyhd8ed1ab_0 conda-forge pyface 7.4.1 pyhd8ed1ab_0 conda-forge pyflakes 2.3.1 pyhd8ed1ab_0 conda-forge pygments 2.11.2 pyhd8ed1ab_0 conda-forge pylint 2.11.1 pyhd8ed1ab_0 conda-forge pyopenssl 22.0.0 pyhd8ed1ab_0 conda-forge pyparsing 3.0.7 pyhd8ed1ab_0 conda-forge pyqt 5.12.3 py39hcbf5309_8 conda-forge pyqt-impl 5.12.3 py39h415ef7b_8 conda-forge pyqt5-sip 4.19.18 py39h415ef7b_8 conda-forge pyqtchart 5.12 py39h415ef7b_8 conda-forge pyqtwebengine 5.12.1 py39h415ef7b_8 conda-forge pyrsistent 0.18.1 py39hb82d6ee_0 conda-forge pysocks 1.7.1 py39hcbf5309_4 conda-forge pytest 6.2.5 py39hcbf5309_1 conda-forge pytest-cov 3.0.0 pyhd8ed1ab_0 conda-forge python 3.9.10 h9a09f29_2_cpython conda-forge python-blosc 1.10.2 py39h2e25243_2 conda-forge python-dateutil 2.8.2 pyhd8ed1ab_0 conda-forge python-graphviz 0.18.2 pyhaef67bd_0 conda-forge python-libarchive-c 4.0 py39hcbf5309_0 conda-forge python-suitesparse-graphblas 5.1.10.1 py39h5d4886f_1 conda-forge python_abi 3.9 2_cp39 conda-forge pytz 2021.3 pyhd8ed1ab_0 conda-forge pywavelets 1.3.0 py39h5d4886f_1 conda-forge pywin32 303 py39hb82d6ee_0 conda-forge pywinpty 2.0.5 py39h99910a6_0 conda-forge pyyaml 6.0 py39hb82d6ee_3 conda-forge pyzmq 22.3.0 py39he46f08e_1 conda-forge qt 5.12.9 h5909a2a_4 conda-forge quaternion 2022.4.1 py39h5d4886f_2 conda-forge requests 2.27.1 pyhd8ed1ab_0 conda-forge ripgrep 13.0.0 h7f3b576_2 conda-forge ruamel_yaml 0.15.80 py39hb82d6ee_1006 conda-forge scikit-image 0.19.2 py39h2e25243_0 conda-forge scikit-learn 1.0.2 py39he931e04_0 conda-forge scipy 1.8.0 py39hc0c34ad_0 conda-forge send2trash 1.8.0 pyhd8ed1ab_0 conda-forge setuptools 59.8.0 py39hcbf5309_0 conda-forge six 1.16.0 pyh6c4a22f_0 conda-forge slicerator 1.1.0 pyhd8ed1ab_0 conda-forge snappy 1.1.8 ha925a31_3 conda-forge sniffio 1.2.0 py39hcbf5309_2 conda-forge sortedcontainers 2.4.0 pyhd8ed1ab_0 conda-forge soupsieve 2.3.1 pyhd8ed1ab_0 conda-forge sparse 0.13.0 pyhd8ed1ab_0 conda-forge sqlite 3.37.1 h8ffe710_0 conda-forge stack_data 0.2.0 pyhd8ed1ab_0 conda-forge tbb 2021.5.0 h2d74725_0 conda-forge tbb-devel 2021.5.0 h2d74725_0 conda-forge tblib 1.7.0 pyhd8ed1ab_0 conda-forge tenacity 8.0.1 pyhd8ed1ab_0 conda-forge terminado 0.13.3 py39hcbf5309_0 conda-forge testpath 0.6.0 pyhd8ed1ab_0 conda-forge threadpoolctl 3.1.0 pyh8a188c0_0 conda-forge tifffile 2022.4.8 pyhd8ed1ab_0 conda-forge tk 8.6.12 h8ffe710_0 conda-forge toml 0.10.2 pyhd8ed1ab_0 conda-forge tomli 1.2.2 pyhd8ed1ab_0 conda-forge toolz 0.11.2 pyhd8ed1ab_0 conda-forge tornado 6.1 py39hb82d6ee_2 conda-forge tqdm 4.63.0 pyhd8ed1ab_0 conda-forge traitlets 5.1.1 pyhd8ed1ab_0 conda-forge traits 6.3.2 py39hb82d6ee_1 conda-forge traitsui 7.3.1 pyhd8ed1ab_0 conda-forge typed-ast 1.5.2 py39hb82d6ee_0 conda-forge typing-extensions 4.1.1 hd8ed1ab_0 conda-forge typing_extensions 4.1.1 pyha770c72_0 conda-forge tzdata 2022a h191b570_0 conda-forge ucrt 10.0.20348.0 h57928b3_0 conda-forge ukkonen 1.0.1 py39h2e07f2f_1 conda-forge urllib3 1.26.9 pyhd8ed1ab_0 conda-forge utfcpp 3.2.1 h57928b3_0 conda-forge vc 14.2 hb210afc_6 conda-forge virtualenv 20.13.4 py39hcbf5309_0 conda-forge vs2015_runtime 14.29.30037 h902a5da_6 conda-forge vtk 9.1.0 qt_py39h1ab545e_207 conda-forge wcwidth 0.2.5 pyh9f0ad1d_2 conda-forge webencodings 0.5.1 py_1 conda-forge websocket-client 1.3.1 pyhd8ed1ab_0 conda-forge wheel 0.37.1 pyhd8ed1ab_0 conda-forge widgetsnbextension 3.6.0 py39hcbf5309_0 conda-forge win32_setctime 1.1.0 pyhd8ed1ab_0 conda-forge win_inet_pton 1.1.0 py39hcbf5309_3 conda-forge winpty 0.4.3 4 conda-forge wrapt 1.13.3 py39hb82d6ee_1 conda-forge xorg-kbproto 1.0.7 hcd874cb_1002 conda-forge xorg-libice 1.0.10 hcd874cb_0 conda-forge xorg-libsm 1.2.3 hcd874cb_1000 conda-forge xorg-libx11 1.7.2 hcd874cb_0 conda-forge xorg-libxau 1.0.9 hcd874cb_0 conda-forge xorg-libxdmcp 1.1.3 hcd874cb_0 conda-forge xorg-libxext 1.3.4 hcd874cb_1 conda-forge xorg-libxpm 3.5.13 hcd874cb_0 conda-forge xorg-libxt 1.2.1 hcd874cb_2 conda-forge xorg-xextproto 7.3.0 hcd874cb_1002 conda-forge xorg-xproto 7.0.31 hcd874cb_1007 conda-forge xz 5.2.5 h62dcd97_1 conda-forge yaml 0.2.5 h8ffe710_2 conda-forge yarl 1.7.2 py39hb82d6ee_1 conda-forge zarr 2.11.1 pyhd8ed1ab_0 conda-forge zeromq 4.3.4 h0e60522_1 conda-forge zfp 0.5.5 h0e60522_8 conda-forge zict 2.1.0 pyhd8ed1ab_0 conda-forge zipp 3.7.0 pyhd8ed1ab_1 conda-forge zlib 1.2.11 h8ffe710_1013 conda-forge zstd 1.5.2 h6255e5f_0 conda-forge

ParticularMiner avatar May 11 '22 06:05 ParticularMiner

@GenevieveBuckley

After updating the python package pims and applying the fix you provided here, I was able to use dask.array.image.imread() to open the video file! Thanks!

However, I noticed that dask.array.image.imread() added a new dimension to the resulting array so that I got five dimensions instead of four. The first dimension had a length of 1; the second was the number of frames in the video; the third and fourth were the image dimensions; and the fifth was the color channel.

Though obviously I can "squeeze" the first dimension out, I did not expect it in the first place since imread()'s docstring says that a "dask array of all images stacked along the first dimension" will be returned. Besides, the array had only one chunk, which potentially puts the RAM at risk for large video files. In contrast, dask-image.imread.imread() does not have these issues.

ParticularMiner avatar May 12 '22 14:05 ParticularMiner

However, I noticed that dask.array.image.imread() added a new dimension to the resulting array so that I got five dimensions instead of four. The first dimension had a length of 1; the second was the number of frames in the video; the third and fourth were the image dimensions; and the fifth was the color channel.

Though obviously I can "squeeze" the first dimension out, I did not expect it in the first place since imread()'s docstring says that a "dask array of all images stacked along the first dimension" will be returned. Besides, the array had only one chunk, which potentially puts the RAM at risk for large video files. In contrast, dask-image.imread.imread() does not have these issues.

Hm, yes. It looks like dask.array.image.imread will have one chunk per file on disk, whereas dask_image.imread.imread will chunk along the pims frames. I don't think there's a solid reason for that being the way it is. Neither set of developers work a lot with movie files (at least, I don't often), so it's possible no-one's really considered it much.

GenevieveBuckley avatar May 13 '22 01:05 GenevieveBuckley

Since dask-image.imread() uses pims.open(), it would be great if it could mirror such functionality too.

pims.ImageIOReader.class_priority = 100  # we set this very high in order to force dask's imread() to use this reader [via pims.open()]
rgb_frames = dask_image.imread.imread('/path/to/video/file.mpg')  # uses ImageIOReader

This does work, from what I can see. It's just difficult to tell, because pims.open() hides away which handler is being used at any particular time.

This is how I checked:

  1. pip install an editable version of pims in my environment (using this bugfix branch, since you want to use the ImageIOReader)
  2. Then I added a print statement print(handler) right above line 201 here to see which handlers get tried.
  3. Make sure both PyAV and imageio-ffmpeg are installed. By default PyAV has a higher class priority.
  4. Start a new python session and open the test video file with dask_image. I can see the PyAV reader is used internally by pims to open the file (because it prints out the handler name).
  5. Adjust the class priority: import pims; pims.ImageIOReader.class_priority = 100
  6. Open the test video file again with dask_image, and note that pims is now using the ImageIOReader to open the file.

GenevieveBuckley avatar May 13 '22 02:05 GenevieveBuckley

To summarize, this thread brings up two points:

  1. It would be good if the pims class priority could be used together with dask-image. I believe this is already possible, discussed in this comment.
  2. The dask.array.image.imread docs are slightly misleading. I've opened https://github.com/dask/dask/pull/9082 to address that point.

Is there anything else I've missed, or you're still having trouble with?

GenevieveBuckley avatar May 13 '22 02:05 GenevieveBuckley

Thanks for your reply.

Sorry, it seems my original post was not clear. What I meant was that I was aware that the following code-snippet does work for single and multi-threaded schedulers. But not for multi-process schedulers. And probably not for distributed-memory schedulers either.

pims.ImageIOReader.class_priority = 100  # we set this very high in order to force dask's imread() to use this reader [via pims.open()]
rgb_frames = dask_image.imread.imread('/path/to/video/file.mpg')  # uses ImageIOReader

rgb_frames.compute(scheduler='single-threaded')  # works
rgb_frames.compute(scheduler='threading')  # works
rgb_frames.compute(scheduler='processes')  # does not work

ParticularMiner avatar May 13 '22 07:05 ParticularMiner

Yeah the single and thread schedulers share the same process memory. So if the priority is set in that process, that is sufficient. All workers view that same memory.

With the process scheduler, different processes have their own process memory space and it isn't shared. Setting information in one does not necessarily get communicated to another. So one would need to do this during process startup. This is handled here. The simplest solution is to just provide your own ProcessPoolExecutor, which would have whatever initializer one used to created the Executor. Would follow this for guidance on setting it up.

An alternative solution, would be to add some kind of dask.config parameter (like these), which would allow one to change the initializer. This could be a totally different function that is run instead. Or perhaps one that gets run as part of the initializer process just later on in that function. This is probably a reasonable PR to do if you wanted to go that way.

Distributed would likely have the same issue for the same reason. However there are a lot more options there. For example preload scripts would work. If you are planning on doing process based execution, would suggest just using Distributed. It has a centralized scheduler, the ability to work with Futures, a rich diagnostic dashboard, etc. Generally this will be a better experience. Also will be easier to go from there to a cluster, the cloud, etc. as needed.

jakirkham avatar May 13 '22 08:05 jakirkham

Many thanks @jakirkham !

The simplest solution is to just provide your own ProcessPoolExecutor ...

I followed your first suggestion since that was the easiest one to understand (as you guessed 😄). And it works (see the following code-snippet)!

import dask_image
import pims


def initialize_worker_process():
    """
    Initialize a worker process before running any tasks in it.
    """
    # If Numpy is already imported, presumably its random state was
    # inherited from the parent => re-seed it.
    import sys
    
    np = sys.modules.get("numpy")
    if np is not None:
        np.random.seed()
    
    # We increase the priority of ImageIOReader in order to force dask's 
    # imread() to use this reader [via pims.open()]
    pims.ImageIOReader.class_priority = 100


def get_pool_with_reader_priority_set(num_workers=None):
    import os
    from dask import config
    from dask.system import CPU_COUNT
    from dask.multiprocessing import get_context
    from concurrent.futures import ProcessPoolExecutor
    
    num_workers = num_workers or config.get("num_workers", None) or CPU_COUNT
    if os.environ.get("PYTHONHASHSEED") in (None, "0"):
        # This number is arbitrary; it was chosen to commemorate
        # https://github.com/dask/dask/issues/6640.
        os.environ["PYTHONHASHSEED"] = "6640"
    context = get_context()
    return ProcessPoolExecutor(
        num_workers, mp_context=context, initializer=initialize_worker_process
    )


rgb_frames = dask_image.imread.imread('/path/to/video/file.mpg') 
rgb_frames.compute(scheduler='processes', pool=get_pool_with_reader_priority_set())   # uses ImageIOReader

I suppose a PR that helps the end-user avoid getting his/her hands dirty with the innards of multi-process scheduler technology would be a good idea.

But before that, perhaps I should try dask.distributed ...

ParticularMiner avatar May 13 '22 12:05 ParticularMiner

@jakirkham

Would the following idea sit well with you?

The idea is to add a new keyword argument, say initializer=None (intended to be a callable) to dask.multiprocessing.get():

https://github.com/dask/dask/blob/137206dc04eb62424617a068405545b26db99a6f/dask/multiprocessing.py#L145-L156

so that we can later replace the following call currently within its body:

        pool = ProcessPoolExecutor(
            num_workers, mp_context=context, initializer=initialize_worker_process
        )

with:

        pool = ProcessPoolExecutor(
            num_workers, mp_context=context, initializer=initializer or initialize_worker_process
        )

This would enable the end-user to pass his/her own process initializer function to compute() or dask.config.set() (if using a context manager).

ParticularMiner avatar May 13 '22 13:05 ParticularMiner

That seems like a reasonable starting point. There may be a few things to firm up, but it is probably easier to discuss these in a PR. Would suggest sending a draft PR to Dask and we can go from there 🙂

jakirkham avatar May 13 '22 19:05 jakirkham

Initializer customization added in PR ( https://github.com/dask/dask/pull/9087 ), which should be in the next Dask release

jakirkham avatar May 19 '22 20:05 jakirkham