py-feat icon indicating copy to clipboard operation
py-feat copied to clipboard

Make `.detect_video` more memory efficient

Open ejolly opened this issue 3 years ago • 0 comments

@ljchang after chatting with @TiankangXie it looks like we can fairly easily roll our own read_video function because torch also provides a lower level API with their VideoReader class.

Just like in their examples, we can just write a function that wraps the next(reader) calls and return a generator so at most we load only batch_size frames at most into memory on each loop iteration. That way even long videos shouldn't be a problem on low RAM/VRAM machines, and more memory will simply allow for bigger batch sizes.

The downside trying to get it to work right now is that torch needs to be compiled with support for it and requires a working ffmeg install:

*** RuntimeError: Not compiled with video_reader support, to enable video_reader support, please install ffmpeg (version 4.2 is currently supported) and build torchvision from source.
Traceback (most recent call last):
  File "/Users/Esh/anaconda3/envs/py-feat/lib/python3.8/site-packages/torchvision/io/__init__.py", line 130, in __init__
    raise RuntimeError(

So it seems like the real cost of rolling our own solution with VideoReader until torch allows for more memory efficient read_video(), is an added dependency on ffmepg and potentially more installation hassle. Or we can try a different library or solution for loading video frames. From a brief search on github it looks like there are lots of custom solutions as third party libraries, because this isn't quite "solved." But most libraries "cheat" a bit IMO. e.g. Expecting that you've pre-saved each frame as a separate image file on disk

ejolly avatar Sep 27 '22 22:09 ejolly