dask-image
dask-image copied to clipboard
Getting movie files into dask efficiently
- dask-image version: 0.2.0
- Python version: 3.7
- Operating System: Mac OSX
Description
I'm interesting in getting movie files - .mov, .mpeg, .avi (basically anything readable with ffmpeg) into dask in a nice way - i.e. something like dask_image.imread.imread but that can accept these formats.
It is possible to read these formats into python via ffmpeg using libraries like imageio.imread or pyav but these tend to return video objects that have iterators or get frame methods on them, but I would like a dask array that I can call into in a lazy fashion to get just what I need and have it be highly performant.
Note there has been some discussion around this on an image.sc post I made, including caveats around attempts at full random access when looking at movie files. I am fine with cacheing of intermediate results to make accessing neighboring frames fast, and I'm fine if making big jumps in the movie is slow, but accessing nearby frames should be fast (I'm interested in using this for interactive movie visualisation using napari so it is reasonable to expect that most times people will be looking at frames in order, but they might want to jump around and things should cached nicely too)
What I Did
I made some attempts at this myself modifying the dask_image.imread.imread code - see here
import imageio
from dask import delayed
import dask.array as da
from dask.cache import Cache
cache = Cache(2e9) # Leverage two gigabytes of memory
cache.register()
def dask_from_mov(path):
vid = imageio.get_reader(path, 'ffmpeg')
shape = vid.get_meta_data()['size'][::-1] + (3,)
lazy_imread = delayed(vid.get_data)
return da.stack([da.from_delayed(lazy_imread(i), shape=shape, dtype=np.uint8) for i in range(vid.count_frames())])
There are more code snippets and links to some .mov files in the image.sc post linked to above if people what more detail.
Overall performance of that approach was not very good. I can do some benchmarking etc, but I suspected that what I'm doing is horrible inefficient from a decoding standpoint and there might be a lower level of the ffmpeg reader to connect with dask. Curious if anyone here has any experience with this or ideas?
Hi @sofroniewn, I like what you're doing here, it's interesting and useful.
dask-image uses the pims library for reading images (I'm not 100% sure on this, but I think pims was chosen because it can support bioformats). Relevant to you, pims also has an ffmpeg reader included: https://github.com/soft-matter/pims/blob/master/pims/ffmpeg_reader.py Docs for this are here: http://soft-matter.github.io/pims/v0.3.3/video.html
You may find that this is still too slow for your tastes, and there's some discussion around this for static image files at https://github.com/dask/dask-image/issues/121. I haven't done any benchmarking of image/movie readers at all, so this would be a very valuable thing to do.
One other place you might try looking for advice could be the fastai forums, for similar reasons suggested by sdvillal over at your post on image.sc. It looks like most things rely on ffmpeg or opencv at the base of things, and the newest fanciest things aim to speed things up by moving even the data loading operations onto the GPU (eg nvvl.VideoLoader which is part of DALI). GPU specific things won't be helpful for you (at least, not for a very long time https://github.com/dask/dask-image/issues/133), but it's possible someone is familiar with hooking into ffmpeg on a lower level.
Hi all. I am a PIMS maintainer and author of its video readers. I recommend the reader backed by PyAV https://github.com/soft-matter/pims/blob/master/pims/pyav_reader.py over the reader backed by FFmpeg linked above. The FFmpeg reader starts ffmpeg in a subprocess and receives data over a pipe. This is a common approach but not a very robust one. The PyAV reader instead uses Cython bindings to FFmpeg, which is much cleaner. It may be useful to note that conda-forge has conda packages for pyav.
See https://scikit-image.org/docs/dev/user_guide/video.html for more. Disclaimer: It expresses similar opinions but that's because because I wrote it. :-D
I'm not 100% sure on this, but I think pims was chosen because it can support bioformats
I heard second-hand that PIMS was chosen because it defers computation more than other readers and is thus suitable for dask, but I can't speak to that authoritatively.
I heard second-hand that PIMS was chosen because it defers computation more than other readers and is thus suitable for dask, but I can't speak to that authoritatively.
That's right 😄 (authoritatively confirmed 😉)
Thanks all for advice, I've started using PIMS with PyAV now via dask_image.imread. I did run into a couple problems, see https://github.com/soft-matter/pims/issues/332#issuecomment-598853644 but installing PIMS from master helps.
I still havn't done any benchmarking yet, but scrolling through these dask_image loaded movies in napari is still slow compared to scrolling through in-memory movies in napari or scrolling through movies in quicktime, so there's still something for me to understand here. I'll dig in more. Also note for napari integration it is much better to use headless open-cv as you don't avoid weird qt conflicts, see this note here https://github.com/napari/napari/issues/1026#issuecomment-595592031
Thanks, that's useful to hear. PIMS and napari have started talking more recently and I expect that to continue, so hopefully we can work together to smooth this out.
That would be great @danielballan - the next napari release (should be < 2 weeks and 0.3.0) will be the first one that supports the addition of reader plugins by @tlambert03 - we've got the basic machinery merged into master, and are now working on a few details and documentation, see https://github.com/napari/napari/pull/1030. At that point I've love to see both PIMS and dask-image be able to load data into napari via our plugin mechanism (which is hopefully pretty light weight and not too far from where you are now).
Is this more or less resolved for you now @sofroniewn?
Is this more or less resolved for you now @sofroniewn?
I do still find loading movie files with dask slower then I would hope, but maybe until I do more through benchmarking or deeper investigations you can close this issue and then I reopen if further discussion is warranted.
Would you be able to sketch out or point to how you are handling movies today, Nick? 🙂