imageio-ffmpeg unbuffered reads from ffmpeg

I don't have time to test and to work on this much, but I think that the code to read in data from ffmpeg can be optimized slightly.

The inefficiencies probably stem from the call to read_n_bytes, which uses reads in a string (an immutable type), then converts to a numpy buffer (by copying the memory).

I found that for my 1048x1328x3 frames, it was able to speed things from reading in a tight loop at 283 fps to 293 fps.

Marginal gain, but maybe somebody needs it. Maybe this can make a bigger difference depending on the workload/hardware/decoding process.

Here is a rough sketch of the patch. You need to set the input to unbuffered see bug report on numpy below.

Patch skeleton

diff --git a/imageio/plugins/ffmpeg.py b/imageio/plugins/ffmpeg.py
index 83f9a9f..5d81546 100644
--- a/imageio/plugins/ffmpeg.py
+++ b/imageio/plugins/ffmpeg.py
@@ -471,7 +471,8 @@ class FfmpegFormat(Format):
             # For Windows, set `shell=True` in sp.Popen to prevent popup
             # of a command line window in frozen applications.
             self._proc = sp.Popen(
-                cmd, stdin=sp.PIPE, stdout=sp.PIPE, stderr=sp.PIPE, shell=ISWIN
+                cmd, stdin=sp.PIPE, stdout=sp.PIPE, stderr=sp.PIPE, shell=ISWIN,
+                bufsize=0,
             )
 
             # Create thread that keeps reading from stderr
@@ -611,7 +612,9 @@ class FfmpegFormat(Format):
             w, h = self._meta["size"]
             framesize = self._depth * w * h * self._bytes_per_channel
             assert self._proc is not None
-
+            s = np.fromfile(self._proc.stdout, dtype=np.uint8,
+                count=framesize)
+            return s, True
             try:
                 # Read framesize bytes
                 if self._frame_catcher:  # pragma: no cover - camera thing
@@ -644,8 +647,9 @@ class FfmpegFormat(Format):
             w, h = self._meta["size"]
             # t0 = time.time()
             s, is_new = self._read_frame_data()
-            result = np.frombuffer(s, dtype=self._dtype).copy()
-            result = result.reshape((h, w, self._depth))
+            result = s.reshape(h, w, self._depth)
+            # result = np.frombuffer(s, dtype=self._dtype).copy()
+            # result = result.reshape((h, w, self._depth))
             # t1 = time.time()
             # print('etime', t1-t0)

I'm not too sure of the other performance implications of buffered vs unbuffered reads. Investigating that will take more time than I have. Maybe other parts of the code can benefit from this kind of stuff.

https://github.com/numpy/numpy/issues/12309

Nov 02 '18 13:11 hmaarrfk

Note that after imageio/imageio#425 most changes will apply to imageio_ffmpeg. But I can imagine you need some (numpy) magic in imageio as well, so leaving the issue here for now.

Feb 05 '19 14:02 almarklein

Transferred this issue from imageio to imageio-ffmpeg. Could this be as simple as adding a bufsize arg to our functions?

Feb 18 '20 13:02 almarklein

I tried a while back. Not too sure why it didn't work.

It may have been something to do with numpy to be honest. I had a pR in flight with them that I never got the tests to pass for

Feb 18 '20 16:02 hmaarrfk

I'm going to close this as I think that users that require this additional performance from python should likely be looking at new pyav plugin.

Jan 01 '23 02:01 hmaarrfk